gs-quant icon indicating copy to clipboard operation
gs-quant copied to clipboard

Add support for pagination or scrolling in `Dataset.get_data`

Open theavey opened this issue 3 years ago • 3 comments

Describe the problem. For requests that are apparently too large, the API will return a timeout error. It doesn't seem clear beforehand what exactly will be a request that is too large, and a timeout error is not particularly helpful.

I opened a ticket with Marquee support asking for the best way to do this or a fix for it, but haven't heard back in a couple days.

Describe the solution you'd like The possibly already supported pagination or scrolling could be made accessible in the method. Then, I can just create a wrapper that will just iterate over chunks and combine the results.

Describe alternatives you've considered I have some code that iterates over years, but that sometimes fails. I could do smaller date ranges, but that would be overkill for smaller requests. I think the biggest issue with alternatives is that I don't want to have to chunk before I know when it might fail because each call introduces latency to my code.

Are you willing to contribute Yes

Additional context I can provide examples of requests that timed out if that's helpful, though running the examples might require access to our paid datasets.

theavey avatar Feb 25 '21 14:02 theavey

Hello @theavey, I would like to contribute to this project by solving this issue. Can I?

Dhavin avatar Aug 20 '21 17:08 Dhavin

I am not an admin of this repo, but that would be great. I've had to implement other workarounds, but a more "native" solution within the package would be helpful

theavey avatar Aug 20 '21 17:08 theavey

Hey @theavey and @Dhavin, we will look into this request. Currently, our Data APIs don't have a scroll/pagination API. If you are seeing timeouts for larger range queries, we currently recommend making smaller date/time range requests. These queries can be parallelized via threads for potentially significant speed improvements. We also have a utility class (https://github.com/goldmansachs/gs-quant/blob/967e8dd450b07e9e2b8fc0c9b2eec61916a5c179/gs_quant/api/utils.py#L46) that helps manage the threads, sessions, and contexts.

Cruppelt avatar Aug 20 '21 17:08 Cruppelt