nebula-python icon indicating copy to clipboard operation
nebula-python copied to clipboard

Why retrying on execution failure

Open wenhaocs opened this issue 11 months ago • 4 comments

Why do we retry on execution failure while nebula-go only retries on session invalid? Moreover, this feature is causing our nebula pytest failure, which (1) uses the latest nebula-python and (2) expects failures.

Related PR: https://github.com/vesoft-inc/nebula-python/pull/306

wenhaocs avatar Mar 19 '24 23:03 wenhaocs

The motivation was https://github.com/vesoft-inc/nebula-python/issues/276 && to make SDK with better DX, to enable retrying only when it makes sense and doesn't bring mess.

If the change is applied to the above purpose we could add options(for now, only one general retry flag was defined) to disable that opinionated retry to enable the pytest run as is? (And we will apply equivalent changes to other clients, too)

If not, I can revert the change.

wey-gu avatar Mar 20 '24 01:03 wey-gu

Maybe we need to discuss whether we should throw the exception or return the error code. In the kernel pytest, we are expecting error code to be returned: https://github.com/vesoft-inc/nebula-ent/blob/master/tests/job/test_session.py#L156. If you folks decide that throwing an exception is the right method, we need to revise the kernel pytest and wrap the related code in the try block.

wenhaocs avatar Mar 20 '24 06:03 wenhaocs

An attempt to implement kernal pytest was done via https://github.com/vesoft-inc/nebula-ent/pull/3485 (I didn't run it in a fine-grained way locally at first, so it took me some time waiting for CIs, and surprisingly after two round of changes to fix 6 errors, another 10 errors came out),while after all py test passed, it turned out there is a big assumption in infra of tck...

It seems quite obvious that we'd better keep the retry mechanism in the session/session pool(not impacting tests actually) w/o introducing connection level exception.

Let's talk tomorrow.

wey-gu avatar Mar 20 '24 17:03 wey-gu

will be fixed with #326 , where:

  • no new exceptions introduced
  • retry of execution error is opt-in

wey-gu avatar Mar 21 '24 04:03 wey-gu