REST API Pagination in Salesforce Ecosystem

Binayak Chakraborty
5 min readApr 3, 2021

--

Recently I came across a scenario where we had to expose a REST api in salesforce and it will be called by a third party . If someone is in the sfdc platform for quite sometime , its might be a usual exercise.But in the requirement gathering phase , found out that the JSON structure to support this activity is a complex one (i.e. querying records from multiple associated objects ) and we have to send more than 100k records and the api will be called out multiple times in a day .

If we are trying to send the whole chunk of 100k records in a complex JSON structure , we will hit the heap size limit of SFDC. So option is to break the data flow it into smaller chunks .

Now in our case the API is a stateless one — we can’t store what records we have send. So we have to design the api in such a way that we can maintain the state in the server side(i.e third party). There can be an argument of maintaining the state in the API/client side ; but if this api is called by multiple systems there needs to be a provision to maintain the state (per callout session) which might be costly.

The first thing which comes in the mind is to use the SOQL OFFSET , split the queried data and send it in response. But it has its own cons like we can query a maximum offset of 2k rows (as of Salesforce Winter ‘21 release) and requesting an offset greater than 2k records results in NUMBER_OUTSIDE_VALID_RANGE error . So even if we consider to use 2k records per offset , we need to have make around 100k/2k ~50 callouts from the third party — which is a costly affair.

So we have to design the framework in such a way — that the cursor is maintained at the end-system and its passed everytime while fetching the subsequent chunks of data. Found out an interesting blog where and how we can chunk the data . We can actually use the sfdc id of the records, use the > and < in the query against the ids and play with it as the cursor .

Also apart from this one , we need to do a small math around the response body , lets say the response body (in JSON)with a single record structure (and maximum of the field’s value) is of X KB , find out whats the SFDC platform allows as the limit in the REST response body (in that release) — lets say Y KB . Divide Y/X to fetch the threshold limit on the number of records which we can send per response . We can have some contingency i.e. if Y/X gives us 5000 , we can have contingency as 10% less and we can put the threshold limit as {5000 — (10 % OF 5000)=}4500. This threshold limit can be stored in a custom label (lets say API_Threshold_data).

In the API ,after fetching the records which we need to send in response,we can also send the below ones (along with the queried data) :

RES1. cursor (sfdc id of the last record record) .

RES2. Total Chunks of records/Total pages : which we will get by dividing the count of records which we are going to send by the API_Threshold_data.

RES3.Current page number .

In the request body of the API , we can have the following parameters :

REQ1. Filter criteria if any .

REQ2. cursor sfdc id . During the first callout, this one will be blank. In the subsequent calls , it needs to be populated with the cursor which we have received it the response.

REQ3. Total Pages : During the first callout, this one will be blank.In the subsequent calls , it needs to be populated with the Total pages which we have received it the response.

REQ4. Page Number Requested : During the first callout, this one will be blank. In the subsequent calls , it needs to be populated with the (Current page number+1) which we have received it the response.

Having stated that , we can design the whole framework in the fashion :

When the API is called for the first time :

  1. The request body wont have the REQ2, REQ3 and REQ4. Sample request body can be like :
{
"customerdata_changed_since" : "2019–04–03T11:30:21.000+0000",//act as filter
"customerdata_changed_until":"2021–04–03T11:30:21.000+0000",//act as filter
"customer_idcursor" : "",
"pagenumber_requested" : "",
"pagenumber_total": ""
}

2. Query the records (based on the filters if any), form your data response and pass the cursor in the response body of the API . Our query can be of the format

Select id,<FIELDS> from <YOUR_sOBJECT> where <FILTER_CRITERIA>
order by id asc limit 10000

We can add a node as api_factory in the response JSON . It will contain the RES1,RES2 and RES3. Sample response can be like :

{"customer_data":[
{
"id_customerid":"0039000001jc6tL",
your json structure ...
},....
{
"id_customerid":"0039000000CEWD0",//last record of this chunk
your json structure ...
}],
"api_factory"{

"customer_idcursor":"0039000000CEWD0",
"pagenumber_total": "10",
"pagenumber_thisrequest":"1"
}}

When the API is called for the second, third ..subsequent times :

  1. The server /third party which is making the callout needs to put the REQ2, REQ3 and REQ4v in the request .This data they can fetch as a part of the first response .Sample request body can be like :
{
"customerdata_changed_since" : "2019–04–03T11:30:21.000+0000",//act as filter
"customerdata_changed_until":"2021–04–03T11:30:21.000+0000",//act as filter
"customer_idcursor" : "0039000000CEWD0",
"pagenumber_requested" : "2",
"pagenumber_total": "10"
}

2. Query the records (based on the filters if any), form your data response and pass the cursor in the response body of the API . Our query can be of the format

Select id,<FIELDS> from <YOUR_sOBJECT> where <FILTER_CRITERIA> AND id >’0039000000CEWD0' order by id asc limit 10000

(Please note the change in addition of the id parameter over here in this query)

Pass the response as usual .Sample response for the 2nd callout :

{"customer_data":[
{
"id_customerid":"0039000000CEWDD",
your json structure ...
},....
{
"id_customerid":"0039000000CEWD2",//last record of this chunk
your json structure ...
}],
"api_factory"{

"customer_idcursor":"0039000000CEWD2",
"pagenumber_total": "10",
"pagenumber_thisrequest":"2"
}}

Thanks for taking your time in reading this . Hope it helps! Happy RESTing and designing!

P.S. This above requirement can also be fulfilled using a middleware in between (maintaining the state) OR use of Bulk API 2.0.

Reference : https://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive-orgs.html

--

--

Binayak Chakraborty

working as a SFDC Application Architect cum Technical Lead