Always Developing - Keyset Pagination (vs Offset pagination)

Daily Knowledge Drop

In a previous daily drop post, I learn about using OFFSET and FETCH to perform paging when retrieving data - however this is not always a good method for retrieving results, especially with a large dataset.

Instead, the usage of keyset pagination should be considered, as it is more performant. Keyset pagination returns a subset of the dataset by using a WHERE clause instead of the OFFSET clause as with offset pagination.

There are however some limitations with the keyset approach, which will be explored below.

Offset

Recap

First, a brief recap on offset pagination - this method uses the OFFSET and FETCH clauses to effectively skip X rows, and FETCH the following Y rows, ordered by the specified column(s):

DECLARE @PageNumber INT = 5,
        @PageSize INT = 20; 

-- In this example, skip 100 records
-- and return the next 20
SELECT Id 
FROM OffSetDemo
ORDER BY Id ASC
OFFSET @PageNumber * @PageSize ROWS
FETCH NEXT @PageSize ROWS ONLY

Based on PageNumber and PageSize, the appropriate number of rows and skipped, and then PageSize rows retrieved.

This approach works, however might not be suitable for all use cases.

Issues

Large datasets

One issue with the offset pagination approach, is that when the OFFSET clauses is used, the database is still required to go through all rows to find the correct starting record.

For example, if we have a database with 1 000 000 rows.

When the following is executed, the database still needs to go through the first 100 rows (the OFFSET value) to find the 101st row.
```
SELECT Id 
FROM OffSetDemo
ORDER BY Id ASC
OFFSET 100 ROWS
FETCH NEXT 50 ROWS ONLY
```
This will probably not have a negative impact on performance, however if the following is executed:
```
SELECT Id 
FROM OffSetDemo
ORDER BY Id ASC
OFFSET 900 000 ROWS
FETCH NEXT 50 ROWS ONLY
```
Now the database needs to go through 900 000 rows (the OFFSET value) to find the 900 001st row, which could definitely have a performance impact.
Missing/duplicate records

If the dataset being used is changing while being queried, the offset method could result in records either being missed, or records being duplicated as paging occurs.

With a page size of 20, when the first page is queried, the first 20 records are returned. When the second page is being queried for, the first 20 rows are skipped (with OFFSET) and the next 20 rows are returned. Consider however, if between loading the first and second page, a record in the first 20 is deleted. The record which was number 21 when the first page loaded, is record 20 when the second page is loaded - this is missed when loading page 1 and page 2.

The same applied if a record is added which falls into the first 20 - the record which previously was number 20, would become record number 21, and thus would appear on page 1 and page 2.

Keyset pagination solves aims to solve these problems (while introducing a different set of limitations).

Keyset

With keyset pagination, instead of using the OFFSET clause to skip rows to determine where the returned dataset should start, a WHERE clause is used to determine where the returned dataset should start.

A few examples will make it clearer - again consider a database with 1 000 000 rows, with an int Id primary key column on which the data is sorted.

To retrieve the first page:

-- OFFSET pagination
SELECT Id 
FROM OffSetDemo
ORDER BY Id ASC
OFFSET 0 ROWS
FETCH NEXT 50 ROWS ONLY

-- KEYSET pagination
SELECT TOP 50 Id 
FROM OffSetDemo
WHERE Id > 0
ORDER BY Id ASC

To retrieve the 10th page (for example):

-- OFFSET pagination
SELECT Id 
FROM OffSetDemo
ORDER BY Id ASC
OFFSET 500 ROWS -- 50 x 10
FETCH NEXT 50 ROWS ONLY

-- KEYSET pagination
SELECT TOP 50 Id 
FROM OffSetDemo
WHERE Id > 500
ORDER BY Id ASC

In terms of performance, the keyset version is a lot more performant - assuming there are indexes created on the WHERE clause columns. Using the WHERE clause and the index, SQL does not need to go through X number rows to find the correct starting point

This also solves the problem of records being duplicated or being missed as absolute values are being used to do the lookup and not offsets. If a record in the first 20 is deleted, it will not effect the second page of 20 records, and the same applied if a new record is added to the first 20 - the second page of 20 records is unaffected.

Keyset pagination is not without its own limitations though.

Limitations

No random access - with keyset pagination one is unable to get, for example, the 15th page. To get a page of data, the last Id (in the above example) of the previous page is required.
Complicated multi-column filtering - with the above examples, the WHERE clause is straightforward, but this can get complicated. Consider if the data is going to be sorted by a CreatedDate column, and if two record have the same CreatedDate, then they should be sorted by Id.
```
SELECT TOP 50 DateCreated, Id 
FROM OffSetDemo
WHERE ((DateCreated > '2022/06/12') OR (DateCreated = '2022/06/12' AND Id > 500))
ORDER BY DateCreated, Id ASC
```
Depending on how many filter conditions there are, the WHERE clause could become complex.

Notes

The choice between the two methods (and any other), as always, comes down to the specific use case.

Need random access to any page? Use offset
Large dataset, and no random access required? Use keyset
Large dataset, but do require random access? Why not both? - offset for the random access, and keyset for the next/previous functionality

References

.NET Data Community Standup - Database Pagination

Daily Drop 108: 01-07-2022

At the start of 2022 I set myself the goal of learning one new coding related piece of knowledge a day.
It could be anything - some.NET / C# functionality I wasn't aware of, a design practice, a cool new coding technique, or just something I find interesting. It could be something I knew at one point but had forgotten, or something completely new, which I may or may never actually use.

The Daily Drop is a record of these pieces of knowledge - writing about and summarizing them helps re-enforce the information for myself, as well as potentially helps others learn something new as well.

sql pagination keyset sort

Keyset Pagination (vs Offset pagination)

Exploring keyset pagination in SQL Server