shareplum
shareplum copied to clipboard
Feature request: Paginated fetches for large lists
As we know from issues like #57, #63, and #106, SharePoint Online (and uncustomized on-premises versions) have a 5000 record limit for views and queries.
A workaround for this limitation is to fetch records in batches less than the 5000 item threshold. An example of this using ListItemCollectionPosition can be found here but requires use of indexes for all queried columns which may not always be possible.
A naïve solution that I have working is to use the ID column and Where clauses to create batches of up to row_limit and loop until there are no more results. This could fail if there are gaps in IDs larger than row_limit in the data though.
Regardless, here is a function that works for me. Perhaps this could be improved upon and included as a feature in SharePlum:
def get_sharepoint_items_paged(
sp_list,
id_column='ID',
where_clause=None,
fields=None,
view_name=None,
row_limit=1000,
):
data = []
current_id = 0
def load_next_result_set():
min_id = current_id
max_id = current_id + row_limit
query = dict()
if where_clause:
query['Where'] = [
'And',
where_clause,
'And',
('Geq', id_column, str(min_id)),
('Lt', id_column, str(max_id)),
]
else:
query['Where'] = [
'And',
('Geq', id_column, str(min_id)),
('Lt', id_column, str(max_id)),
]
return sp_list.GetListItems(
view_name=view_name,
fields=fields,
query=query,
row_limit=row_limit,
)
result_set = load_next_result_set()
while len(result_set) > 0:
current_id += row_limit
data.extend(result_set)
result_set = load_next_result_set()
return data
id_column defaults to ID (I assume this is present on every list but user may want to use another column) and where_clause is optional (I needed it for my use case), and is simply prepended to the pagination clauses. Otherwise I think the arguments are fairly self-explanatory (passed through to GetListItems as-is). Oh and row_limit defines the batch size (defaulting to 1000).
Thoughts?
How would you know before hand how to break up the query? Are you saying you would use the total number of row ids in the list and limit the query to each subset to ensure that you didn't exceed 5000?
@jasonrollins No idea what the best approach is to be honest, this is just a hack I threw together to get something working. I am not a SharePoint expert, but this seems like a common enough pain point to try something to address it.
I am no longer using SharePlum myself, but that is more to do with moving away from SharePoint than any kind of knock against this library. Appreciate your work!
my revised:
def get_sharepoint_items_paged(
sp_list,
where_clause=None,
fields=None,
view_name=None,
page_size=1000,
):
data = []
last_id = 0
def load_next_result_set():
query = dict()
if where_clause:
query['Where'] = [
'And',
where_clause,
('Gt', 'ID', str(last_id)),
]
else:
query['Where'] = [
('Gt', 'ID', str(last_id)),
]
query['OrderBy'] = ['ID'] # replace order by ID
return sp_list.GetListItems(
view_name=view_name,
fields=fields,
query=query,
row_limit=page_size,
)
result_set = load_next_result_set()
while len(result_set) > 0:
last_id = int(result_set[-1]['ID'])
# print(last_id)
data.extend(result_set)
result_set = load_next_result_set()
return data