graphbrainz
graphbrainz copied to clipboard
Batch lookup queries using the search endpoint with ID fields
I've been running a large amount of complex queries, and often it's the case that deeply nested results lead to lots of individual lookup queries for entities.
I hacked up a proof-of-concept that uses dataloader's batching feature to recognize such lookups and make them using the search endpoint.
e.g. /ws/2/recording?query=rid:"{MBID}" OR rid:"{MBID}" OR ...
Then it will automatically map the results to the original, individual lookup requests, so they don't even know a different endpoint was used.
It works surprisingly well. The search endpoint also tends to return more data for each entity by default. This can speed up queries by 20–50x and avoid waiting for rate limits to clear up.
I'm planning on a rewrite of the whole GraphBrainz "query engine" to support this.