Office365-REST-Python-Client
Office365-REST-Python-Client copied to clipboard
Exceed 5000 list items limit
I am able to access lists via this api, but always obtain empty items. The list is quite large, which contains 50,000+ records. Also get the same results with no data when using Postman; however, there are data if using AJAX.
After a long time research, I find the root cause is the 5000 list items limit in sharepoint API. I am trying to use solve this problem with caml query, but failed when using itemPosition with the error: ClientRequestException: ('-2147024860, Microsoft.SharePoint.SPQueryThrottledException', 'The attempted operation is prohibited because it exceeds the list view threshold.'
Please kindly give me some help and suggestions to achieve the data.
Part of my codes is shown below:
itemPosition = None
while True:
query_text="<View><ViewFields><FieldRef Name='ID'/></ViewFields><Query></Query><RowLimit>5000</RowLimit></View>"
caml_query = CamlQuery.parse(query_text)
caml_query.ListItemCollectionPosition = itemPosition;
items = list_source.get_items(caml_query)
ctx.load(items)
ctx.execute_query()
itemPosition = items.ListItemCollectionPosition
print(itemPosition)
if itemPosition == None:
break
I am curious how to use itemPosition in python, which might solve this issue. @vgrem
I have having the same issue, I can read the directories but everytime I try to return a list of files I either get an empty list or 500 max limit error @mingyuanhua have you solved this?
I have having the same issue, I can read the directories but everytime I try to return a list of files I either get an empty list or 500 max limit error @mingyuanhua have you solved this?
Nope. Eventually I connected to the db directly rather than using the API.
How do you connect DB ? Which DB ?
Any solutions if list item has more than 5000 records how to handle ?
Any solution for this ?
I could not, I now just do individual calls i.e. (i'm still looking to figure on how to do bulk calls (i.e. use this function but call 100 at once) FILE_URL = '/sites/MySite/Folder/File.xlsm' ctx.web.get_file_by_server_relative_url(FILE_URL).get().execute_query()
Is the ListItemCollectionPosition class currently supported? And if so, what type of paging info should we give as input when calling the class? I tried to call
ListItemCollectionPosition("Paged=TRUE&PagedPrev=TRUE&p_ID=114957") , so that it starts from the item with id 114957, and then
qry=CamlQuery(listitem_collection_position=position) qry.ViewXml='.......' items = large_list.get_items(create_paged_query(10)).execute_query()
it results in
ClientRequestException: ('-1, Microsoft.Data.OData.ODataException', "A type named 'ListItemCollectionPosition' could not be resolved by the model. When a model is available, each type name must resolve to a valid type.", "400 Client Error: Bad Request for url.
Any idea how to solve the pagination issue? Currently, i have a large list with more than 50.000 files, so the only way to query it is by using paging. I can use the paged_query function from your examples to query the first 5000 files, but then how could I get the next page of results?
Thanks.
Hi guys,
I have the same problem here, impossible to work with ListItemCollectionPosition. So after some hours, I decided to code a quick workaround. It might not be the most elegant, but it works.
The idea is to have two queries:
- one that select items between two numerical values of a field (ID / UID should be the best), separated by a batch_size
- the other checking if there are items left above a numerical value
And then in a while loop, I run the first query and do my stuff (currently pass). But if the len of the items collection is 0, I check if there are more items above the greater numerical values. If no, I quit the function.
Here is the code (which inside a method of a class). Hope it helps!
def get_id_batch_query(low_value, high_value):
qry = CamlQuery()
qry.ViewXml = f"""
<View Scope='RecursiveAll'>
<Query><Where><And>
<Geq><FieldRef Name='ID' /><Value Type='Integer'>{low_value}</Value></Geq>
<Lt><FieldRef Name='ID' /><Value Type='Integer'>{high_value}</Value></Lt>
</And></Where></Query>
</View>
"""
return qry
def get_id_above_query(value):
qry = CamlQuery()
qry.ViewXml = f"""
<View Scope='RecursiveAll'>
<Query><Where>
<Geq><FieldRef Name='ID' /><Value Type='Integer'>{value}</Value></Geq>
</Where></Query>
</View>
"""
return qry
def more_id_to_get(s_list, value):
list_qry = get_id_above_query(value)
items = s_list.get_items(list_qry).execute_query()
if len(items) == 0:
return False
else:
return True
id_count = 0
while True:
list_qry = get_id_batch_query(id_count, id_count+batch_size)
items = s_list.get_items(list_qry).execute_query()
if len(items) == 0:
if not more_id_to_get(s_list, id_count):
print("No more items to get")
print("Stopping above id_count " + str(id_count))
return
else:
print("Loaded items count: {0}".format(len(items)))
for index, item in enumerate(items):
pass
#print(item.properties)
id_count += batch_size
id_count = 0 while True: list_qry = get_id_batch_query(id_count, id_count+batch_size) items = s_list.get_items(list_qry).execute_query()
What is the s_list here? Cuz when I try to list folder that contains more than 5000 it returns 0 files there, so there is no point in passing that list. I tried sth like that https://github.com/vgrem/Office365-REST-Python-Client/blob/master/examples/sharepoint/lists/read_large_list.py to load list, but there is no page like that, so now I am not sure what I'm supposed to pass as s_list. My site looks like that: site -> - folder 1 - folder 2 ------ - folder_that_contains_over_5k_files - folder 3
so my list for s_list should be: large_list = ctx.web.lists.get_by_title("folder 2/folder_that_contains_over_5k_files") Am I right?
Hello,
Sorry not responding sooner.
s_list is the item returned when you connect to sharepoint and request your list:
# Initiate connection and get data as SharePoint list object
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
sp_lists = ctx.web.lists
s_list = sp_lists.get_by_title(liste_name)
It works fine for me, I don't use the inbuilt functions, I only use the basic ones.
But then when I cleaned my code I did it a bit differently. I wrote a "Connector" class with the following method to initiate the connection:
def get_data_from_source (self, data_id: int):
"""Connect to client source and return raw data
Args:
data_id (int): number representing the data in the config file
Returns:
depends on client source: raw data, types depends of client source
"""
if self.source == "SP":
# Get relavant value from config file
site_url = self.config['Connection']['site_url']
username = self.config['Connection']['username']
password = self.config['Connection']['password']
data_name = self.config['Client_lists']['list_names'][data_id]
data_info = self.config[data_name]
# Initiate connection and get data as SharePoint list object
ctx = ClientContext(site_url).with_credentials(UserCredential(username, password))
sp_lists = ctx.web.lists
data = sp_lists.get_by_title(data_name)
return data_name, data_info, data
self.confing being a conf file with my connection info and liste names, etc.
Then I have another method that calls the "data" returned by this method (wich contains, amongst other, the sharepoint list).
Hope it helps.
Dear @vgrem. Thank you for providing Office365-REST-Python-Client/examples/sharepoint/lists/read_large_list.py, which caters for Sharepoint lists that give the error if retrieved in the normal way.
I have tried to work out how to do the same for folders that contain 10,000 or more files, and I have not succeeded. Can you please provide some hints as to how this might be possible? Thank you very much.
Greetings @MauriceSnell and all,
here is an important update, in 2.3.14
version a few improvements in terms for working with large lists have been introduced
By large the list with the amount of items which exceeds
5000
items are considered
Namely, to return all the items the following options could be considered:
ClientObjectCollection.get_all()
new method
get_all()
method to retrieve all the items in a collection, regardless of the size, for example:
def print_progress(items):
"""
:type items: office365.sharepoint.listitems.collection.ListItemCollection
"""
print("Items read: {0}".format(len(items)))
page_size = 5000 #corresponds to default list view threshold
ctx = ClientContext(site_url).with_credentials(client_credentials)
large_list = ctx.web.lists.get_by_title(list_title)
all_items = target_list.items.get_all(page_size, print_progress).execute_query()
ClientObjectCollection.paged(n)
method
It returns paged result, the following example demonstarte how to enumerate items across all the pages:
page_size = 5000
ctx = ClientContext(site_url).with_credentials(client_credentials)
large_list = ctx.web.lists.get_by_title(list_title)
paged_items = large_list.items.paged(page_size).get().execute_query()
for index, item in enumerate(paged_items): # type: int, ListItem
print("{0}: {1}".format(index, item.id))