python-o365 icon indicating copy to clipboard operation
python-o365 copied to clipboard

HTTPSConnectionPool ERROR - Can we modify the $top value in query?

Open liab25 opened this issue 4 years ago • 15 comments

I get the following error when pulling emails from an inbox.

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: /v1.0/users/[email protected]/mailFolders/Inbox/messages?%24top=999&%24expand=attachments%28%24select%3Dname%29 (Caused by ResponseError('too many 504 error responses'))

If i use the following code to limit the number of emails to process, it seems to work: messages = inbox.get_messages(limit=20, batch=20, query=q) But when I change to this, it will idle for a bit then spit out the error above: messages = inbox.get_messages(limit=None, batch=200, query=q)

Any way around this? Am i just hitting some MS Graph throttling limit?

liab25 avatar Feb 24 '21 06:02 liab25

throttling in theory returns a 429. So i don't know what's this....

alejcas avatar Feb 27 '21 21:02 alejcas

Yea that's what i thought too. Problem appears to be with the trying to pull too many requests from Microsoft's API. I found one of their KB articles which references the error and says we need to adjust the $select and/or $top values. I looked at the source code and the URL in the error and it seems to be setting $top to 999. Im wondering if there's a way I can change this value.

According to this KB: https://docs.microsoft.com/en-us/graph/api/user-list-messages?view=graph-rest-1.0&tabs=http. But doesn't make sense to me, i thought the "batch" parameter was already doing paging so we only retrieve X amount of results until the limit is reached.

liab25 avatar Feb 27 '21 21:02 liab25

Use limit param

alejcas avatar Feb 28 '21 08:02 alejcas

Use limit param

So that's the problem. If i set limit to something like 500 and batch to 100, my understanding is it will only retrieve 500 emails and page them with 100 emails each until the limit is reached. I need to be able to retrieve all messages in the inbox no matter the size/amount of messages it contains. I've set the limit to None and batch to 100 and even as low as 20 but I am still getting the same error.

It's only when i set a limit to something other than None that it works. Is this just a MS limitation?

liab25 avatar Feb 28 '21 09:02 liab25

For me it works if I set limit to none. Maybe you hit some internal limit or something. I don’t know. Ms graph is pretty obscure sometimes

alejcas avatar Feb 28 '21 13:02 alejcas

Yeah that’s what I’m thinking...I was able to process inboxes with up to 15k emails at one point without issue now boxes with 1-2k are causing problems. Probably an MS thing like you said

liab25 avatar Feb 28 '21 21:02 liab25

I am actually having this exact same issue while retrieving the MS Graph Calendar. I have been recently getting one of the two following errors, and it is changes randomly:

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: <URL> (Caused by ResponseError('too many 503 error responses'))

OR

HTTPSConnectionPool(host='graph.microsoft.com', port=443): Max retries exceeded with url: <URL> (Caused by ResponseError('too many 429 error responses'))

I am pulling the data using:

q = calendar.new_query('start').greater_equal(self.six_am)
graph_events = calendar.get_events(query=q, include_recurring=False, limit=None)

I know the 429 error is due to too many requests (I believe they allow 17 per second). The weird thing is I can run it once and it fails, but if I run it again one or two more times it eventually will work. This query usually only returns about 500 items so it is not large.

By the way, I do love this project. Thank you for everything.

Any tips @janscas ?

pythonista092920 avatar Mar 16 '21 13:03 pythonista092920

@pythonista092920 Thanks! I have no clue on this... I'm sorry.

alejcas avatar Mar 16 '21 13:03 alejcas

It is alright, it was worth a try @janscas

In connection.py, do you see any problem changing within the init of class Connection to try changing around the request_retries and request_delays to see if that could help?

pythonista092920 avatar Mar 16 '21 13:03 pythonista092920

Try with different values like request_retries=None to disable the retries. Also requests_delay will help you avoid the 429 error.

Defaults are retries=3 and delay=200 miliseconds.

alejcas avatar Mar 16 '21 15:03 alejcas

I'll try running it today with request_retries=None and requests_delay=500 and see what happens. Thank you @janscas

pythonista092920 avatar Mar 16 '21 15:03 pythonista092920

Unfortunately that did not work, still getting 503 errors. I am going to build some extra exception handling around calendar.get_events internally and see if I can get some better results. If I run it a second time, it usually works. Maybe the MS graph servers are just getting too much traffic to handle? Thanks for your response @janscas

pythonista092920 avatar Mar 16 '21 19:03 pythonista092920

Great to know @pythonista092920

Thanks

alejcas avatar Mar 16 '21 19:03 alejcas

@pythonista092920 did your error get resolved?

arkadas19 avatar Jun 23 '21 15:06 arkadas19

@arkadas19

I actually still would have issues periodically. What I ended up doing was wrapping the calendar.get_events() method in a try except block and setting a 10 retry limit. Everything has ran perfect since making these changes. Maybe this code will help give you an idea.

`

  try:
        schedule = graph_account.schedule(resource=email)
        calendar = schedule.get_default_calendar()

        calender_events_not_retrieved = True

        loop_attempt = 0
        retry_attempt = 0
        while calender_events_not_retrieved:
            loop_attempt += 1

            if loop_attempt > 10:
                print("The loop has ran 10 times, quitting the program.")
                quit()

            if retry_attempt < 9:
                try:
                    graph_events = calendar.get_events(include_recurring=False, limit=None)
                    calender_events_not_retrieved = False

                except Exception as calender_not_retrieved_exc:
                    retry_attempt += 1
                    print(f"Hit calender_not_retrieved_exc: {calender_not_retrieved_exc}")
                    print(f"Could not pull calendar data from MS Graph, trying again. "
                          f"Starting reattempt {retry_attempt}")
                    time.sleep(60)

            else:
                print("Too many retry attempts, quitting the program. MS Graph servers appear to be not "
                      "accepting the requests.")
                quit()

    except Exception as get_calendar_data_exception:
        print(f"Hit the following exception: {get_calendar_data_exception}")

    print("Successfully pulled the calendar data")      

`

pythonista092920 avatar Jun 24 '21 18:06 pythonista092920