trino-python-client
trino-python-client copied to clipboard
Add support for polling.
This PR was originally put up against the original prestodb version of this library, which now seems to be defunct; and adds support for polling the status of presto queries, decoupling it from collection of results (fixing #63 on that repository). This is important especially for long-running queries.
Changes:
- Removed
PrestoResult
and merged its functionality withPrestoQuery
. Keeping both led to some weird logic loops as each tried to keep the other up to date once polling was added. - Made
PrestoQuery.fetch
a private methodPrestoQuery._fetch
, since calling it directly leads to the iterator not being in sync. - Made the
PrestoQuery
object cache the query results, otherwise there is some weird behaviour whereby if you stop iterating over the rows, you throw away all of the rows left over in the last retrieved chunk. - Switched the
fetchone
/fetchmany
relationship so thatfetchone
is a special case offetchmany
(because the changes above allow multiple iterators to be created over time, so long as there is only one at time, and in this way we don't create a new generator for every row).
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to [email protected]. For more information, see https://github.com/prestosql/cla.
@ebyhr There was a trivial oversight in keeping track of progress when requesting one row at a time. I've pushed the fix, and signed the CLA.
@cla-bot check
The cla-bot has been summoned, and re-checked this pull request!
@ebyhr @martint Any updates here?
We're running into the same issue with long pulling queries.
Any chance we could see this being merged in a near future?
@matthewwardrop Are you planning to work on this soon?
@amitds1997 I'm not sure. My original PRs to the presto-python-client and then this one have "enjoyed" very slow review times, making it challenging to make much headway (by the time a review occurs, I've not thought much about the patch for many months). @findepi's comments were valid ones, but as far as I can tell I already answered these concerns in the original PR description. I'm happy to restage the PR as a series of commits if that is helpful, but note that it won't be entirely straightforward: these changes hang together, and each standalone commit won't make a tonne of sense without the context of the others. I'm also happy if someone else closer to the project wants to reimplement things in a different way, but after a review of this patch I still feel this is a pretty clean implementation. It does make several "unnecessary" changes, but only in the sense that you although you could achieve polling without the other changes, the extra changes prevent weird epicycles in the code and the risk of losing data depending on the order in which you call methods on the class instances.
I originally wrote this patch so I could migrate to this more official client from pyhive
, since I thought things would wind down there.... but things on that side have continued kicking along, and their Presto client already features this polling behaviour, so I haven't been too motivated to push hard on this.