GAM icon indicating copy to clipboard operation
GAM copied to clipboard

GAM errors out printing devices if it takes > 1 hour to fetch them all

Open jay0lee opened this issue 2 years ago • 1 comments

Steps to reproduce

  • gam print devices on a domain with a large (150,000 or more) devices. After running for one hour, GAM will fail with an error like ERROR: 400 - Request contains an invalid argument. - 400
  • For a domain with fewer enrolled devices, it's possible to simulate this issue by setting pageSize=1 here and then adding a sleep(600) at the bottom of the while True: loop here. This tells GAM to only retrieve one device per page (the default is 100) and to sleep 10 minutes in between page retrievals. Assuming you have at least 6 devices, the process should fail in just over an hour's time just like the above issue in a.large domain.

Further detail

This is Google internal bug 237397223. It seems that a series of pages retrieved by the API as a script loops through pages and nextPageToken values (a book if you will) is only good for an hour. If you try to retrieve a page in that sequence later than one hour after the first page was retrieved (where pageToken was not set) then the API call will fail with the above error.

Most customers don't hit this because an hour is plenty of time to retrieve tens of thousands of devices but large customers with more than 150,000 devices will run into it.

It's also worth noting that deviceUsers.list() faces the same issue.

Other Google API list() calls may face similar issues and should probably be tested.

Workaround

To work around this issue we can try:

  • retrieve the first hour of pages from the devices.list() API. Set the orderBy=creation_time parameter. That parameter will ensure we are retrieving the oldest devices first.
  • As we retrieve pages of devices in this first hour, look at the createTime parameter of each retrieved device and keep track of the newest device we see.
  • When we do get the 400 error, start over with another devices.list() API call and a new set of pages but this time additionally set the filter parameter. The filter can be set to something like filter=register:<create_time_of_newest_device> where we are setting the create time of the newest device we've already retrieved. This essentially allows us to pick up where we left off when the Google servers threw an error. We do need to de-dupe the results since Google will send us the last device again but that can be handled relatively easily.

jay0lee avatar Jun 30 '22 00:06 jay0lee

This "bug" drove me crazy last year. :) https://groups.google.com/g/google-apps-manager/c/hMywF_k1FxI/m/lYXSGlJXCwAJ

I tried to export all browsers now with gam print browsers fields browsers and it still seems to behave the same for me. (Ending with an error after one hour.) Was it supposed to fix also browser export or just ChromeOS devices?

unextro avatar Aug 17 '22 10:08 unextro

This issue should be fixed on Google's end now. The pageToken should continue to work even after 1 hour. Work remains to back out GAM's complicated workarounds that were needed to deal with the bug.

@taers232c fyi

jay0lee avatar Apr 18 '23 22:04 jay0lee

There was a similar bug when printing cros telemetry, do you know if that is fixed as well?

Thanks

taers232c avatar Apr 19 '23 17:04 taers232c