gspread-pandas
gspread-pandas copied to clipboard
Google Spreadsheet's metadata is being re-fetched too often
gspread-pandas' Sheet
class has implemented some logic to know when sheet metadata needs to be refetched - cf refresh_spread_metadata
. However, the current implementation fails to make use of its potential. When accessing a multi-sheet spread using sheet_to_df
, debug logs show that the spreadsheet's metadata endpoint is hit multiple times - in my example, 9 times to fetch 2 sheets.
I have created a standalone pytest that reproduces this behaviour: https://gist.github.com/mephinet/7415a4641a65fbf0d51f0eae11ee21dc I failed to add it to the betamax'ed test suite because I ran into timeouts...
Tested with gspread-pandas version 3.2.3 on Arch Linux, as well as with the git master.
To show the potential: changing the Sheet.sheets
decorator from @property
to @cached_property
reduces the requests from 9 down to 3:
- one from
open_spread
->open_by_url
- one from
open_spread
->refresh_metadata
(which is unnecessary here) - and one from
open_sheet
(which is caused by accessing thesheets
property)
Fixing this issue would improve the performance of gspread-pandas by a relevant factor as I'm currently seeing 429 responses caused by these metadata requests, causing a significant slowdown.
Hey, sorry I haven't had much time to maintain this and e-mails seem to get lost in my inbox. The reason I originally implemented all this refresh metadata logic is because there's a few methods that depend on available metadata. If we don't refresh it, the values will be wrong. For example, number of rows and cols and number of frozen rows and cols all depend on that metadata.
The best way to avoid these refreshes would be to parse the responses for the different methods that modify the sheet, extract the updated values, and update the metadata. Unfortunately, that is likely a pretty manual process since the response format depends on whether it's a batch operation or not.
I'm not sure if gspread v6 changes any of this, but unfortunately I don't have much time to look into this. any PRs would be welcome though!