Convert all query processes to use `LIMIT` and `OFFSET`
Terms
- [X] I have searched open and closed feature requests
- [X] I agree to follow Scribe-Data's Code of Conduct
Description
Related to the work that's happening in #124, we made the decision in the last dev sync that we'll be doing a new method of breaking down queries that are too large to return information because of time out restrictions. The first version of this will be implemented in #124, and then other queries should further be changed to run on the new method where all queries will have a LIMIT and OFFSET set within the query that can then be programmatically changed. The method for this will be:
- For each query we'll run a basic query to derive how many Wikidata items will be returned
- This will then be used to derive a chunk size for the
LIMITandOFFSET- Say that there are 100K items to return data for, so we could have a
LIMITof 50K and programmatically set anOFFSETof0and50000 update_data.pywould then loop the versions of the query and append the results to a common output
- Say that there are 100K items to return data for, so we could have a
Note: In the sync I was talking that we'll also switch over all of the _1, _2, etc queries to also work like this. This may not be possible, as if memory serves me part of this was also that Wikidata has a character limit to what you can pass to it (this is why all the queries are written with very short abbreviations). We can test this and see if we can convert these queries into a single common one as well 😊
Contribution
Happy to work on this with people as far as planning the scope of the work and helping with implementation! 🚀