pyvo icon indicating copy to clipboard operation
pyvo copied to clipboard

Unnecessary calls to _update() in AsyncTAPJob

Open tomdonaldson opened this issue 5 years ago • 1 comments

In AsyncTAPJob, a call to _update() hits the TAP server with a job status query. I think we're calling it unnecessarily in some cases, with some of those being subtly hidden.

For better or worse, on some TAP servers, these job status queries take a noticeable amount of time, so these extra queries have a noticeable impact on the overall query time.

https://github.com/astropy/pyvo/blob/96443beb9e623bd2ff5a2cbb143001188dfbef37/pyvo/dal/tap.py#L473

  • Since AsyncTAPJob is typically instantiated in create(), this _update() in __init__() gets called before the job is even submitted.

https://github.com/astropy/pyvo/blob/96443beb9e623bd2ff5a2cbb143001188dfbef37/pyvo/dal/tap.py#L540

  • The phase property does an update on each access. While this does result in always giving the most recent possible value for phase, it hides the extern request from the caller who may really want just want the local version of _phase that was returned from the most last explicit update().
    • The worst instance of this is in raise_if_error(), which calls self.phase to check for an error, then calls it again in raising the error. And when raise_if_error() is called in fetch_results() or run_async(), it's called immediately after an _update(). So instead of one external request to find and raise a job error, there are 3.
    • I'm thinking that properties probably should not do external requests. Since they don't take arguments, there's no opportunity for the caller to specify whether they want a freshly updated value. Perhaps an accessor like def get_phase(update=False) would be better?

tomdonaldson avatar Feb 19 '20 17:02 tomdonaldson

On Wed, Feb 19, 2020 at 09:54:42AM -0800, Tom wrote:

For better or worse, on some TAP servers, these job status queries take a noticeable amount of time, so these extra queries have a noticeable impact on the overall query time.

What is "noticeable"? Very frankly, I'd suspect if that's actually non-negligible, there's something somewhere on the bug spectrum on these server implemenations.

But sure, we ought to strive to reduce network traffic, and there are other reasons for cleaning these things up.

https://github.com/astropy/pyvo/blob/96443beb9e623bd2ff5a2cbb143001188dfbef37/pyvo/dal/tap.py#L473

  • Since AsyncTAPJob is typically instantiated in create(), this _update() in __init__() gets called before the job is even submitted.

On superficial inspection, I'd say that's all right. Create() does build the server-side job before instantiating the class (line 455). Also, _update creates the _job attribute on the class, so unless it's called, the class is incomplete. Hence, it seems it must be called from the constructor from what I can see.

https://github.com/astropy/pyvo/blob/96443beb9e623bd2ff5a2cbb143001188dfbef37/pyvo/dal/tap.py#L540

  • The phase property does an update on each access. While this does result in always giving the most recent possible value for phase, it hides the extern request from the caller who may really want just want the local version of _phase that was returned from the most last explicit update().

I give you the decision to hide external requests behind properties was one of the worse ideas I've had (that part goes back to DaCHS' gavo.votable.tapquery, so I take full responsibiltiy).

However, I'd say part of the problem is that the implemenation, perhaps to save on parsing code, does not go through the phase endpoint on the job, which is a good deal lighter on the service (at least in DaCHS) than the full job representation.

So, I'd suggest to go back to querying /phase here. The original tapquery implemenation (http://svn.ari.uni-heidelberg.de/svn/gavo/python/trunk/gavo/votable/tapquery.py) actually has a bit of extra cruft (in _PhaseParser); these days, no server returns XML-like stuff from .../phase any more, they're all properly returning just the phase in the string.

  • The worst instance of this is in raise_if_error(), which calls self.phase to check for an error, then calls it again in raising the error. And when raise_if_error() is called in fetch_results() or run_async(), it's called immediately after an _update(). So instead of one external request to find and raise a job error, there are 3.

Yeah -- that's certainly quite on the bug side on the bug spectrum.

  • I'm thinking that properties probably should not do external requests. Since they don't take arguments, there's no opportunity for the caller to specify whether they want a freshly updated value. Perhaps an accessor like def get_phase(update=False) would be better?

As said above, in general I agree, and if I wrote tapquery again today -- it's from 2010 and basically was a stopgap measure to at least have one TAP client --, I'd not do it again.

I wouldn't break the existing interface any time soon, though, just perhaps move towards using the special endpoints (phase, execution_duration, etc) as a simple measure to make things a bit cheaper.

But in particular with a view to UWS 1.1's slow poll functionality, I'd support adding a few functions that let users pass in the timeout; it's a bit of a shame that the only way they can, right now, enjoy slow polls, is though wait().

Of course, once we go there, it'd be cool if we added true asyncio support; people could then fire off 1000 queries and then just await them, which would internally do slow polls. Doing that with normal polls is close to impossible. With asyncio, it's probably not terribly taxing even on the server (I'm increasing the file descriptor limit on DaCHS to 4000 by default to allow this kind of thing).

msdemlei avatar Feb 20 '20 09:02 msdemlei