remoting icon indicating copy to clipboard operation
remoting copied to clipboard

[JENKINS-44785] - Built-in request timeout support

Open oleg-nenashev opened this issue 8 years ago • 13 comments
trafficstars

This is a PoC implementation of the timeout support in Remoting API, See one of the use-cases in JENKINS-44785

  • [x] - Implement working PoC
  • [ ] - TBD make decision about switching to Java 8. The PoC needs it for the java.time.Duration class, but it can be replaced if we stay on Java 7
  • [ ] - If Java 8 is selected, implement new call() default implementations in interfaces (with TimeoutExceptions) to address the JENKINS-44785 request from @jglick
  • [ ] - channel#callAsync() should also support timeouts on the remote side
  • [ ] - TBD: If we stay on Java 7, stick to InterruptedException?
  • [ ] - Weaponize it! Set some default timeouts for user requests and RPC calls

@reviewbybees @jglick

oleg-nenashev avatar Jun 29 '17 13:06 oleg-nenashev

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

ghost avatar Jun 29 '17 15:06 ghost

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

jglick avatar Jun 29 '17 21:06 jglick

channel#callAsync() should also support timeouts on the remote side

Why? I see no need for it.

One reason is that you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it.
This may not be say an Agent where you may decide resources are cheap, but where remoting is used inter master. For example - executing a groovy script on a bunch of inter-connected masters - but the groovy script accidentally uses "while (true)" without a break clause, using the timeout on the remote side the script can be interrupted whilst it is still running.

jtnord avatar Jun 30 '17 09:06 jtnord

I suppose @jglick just expected the timeout on one side. In this PR I have intentionally added timeouts on both sides. As @jtnord says, such timeout may be useful if the request hangs due to whatever reason, especially since we have a limited threadpool for them.

oleg-nenashev avatar Jun 30 '17 10:06 oleg-nenashev

w.r.t. callAsync

you ask for something to happen on the remote side and it takes a while to process and if it can not return within the default you do not want to waste resources continuing to do it

So the body of the callable is free to impose any time limit it sees appropriate. For that matter a poorly written callable might be consuming hundreds of megs of heap for no good reason. This is no different from local method calls—not a concern of Remoting.

jglick avatar Jun 30 '17 15:06 jglick

This is no different from local method calls—not a concern of Remoting.

As a maintainer of Remoting, I think having a timeout for such calls is important. Do you block the PR due to that? Or just "IMHO YAGNI"?

oleg-nenashev avatar Jul 03 '17 06:07 oleg-nenashev

Imho I agree with @jglick

Timingboutvthe remote operation should not be a concern of the remoting api... (or at best it should be something that the caller opts-in e.g. By using a wrapper channel)

I am inclined to object, but at this point I will just give a stern look and ask you to critically self-review ;-)

stephenc avatar Jul 03 '17 07:07 stephenc

or at best it should be something that the caller opts-in

Currently it's opt-in since the default timeout is "no timeout"

oleg-nenashev avatar Jul 03 '17 08:07 oleg-nenashev

Currently it's opt-in since the default timeout is "no timeout"

But you are littering the API by multiplying methods.

When somebody needs a timeout on the remote execution they probably just want you to provide them with a wrapping callable that does the timeout for them.

That would simplify the API and simplify the implementation.

IMHO less methods to choose from is better. I could probably go so far as to provide a wrapper to the callable for the caller timeout too so that, in effect, the caller just adds the timeouts to their callables

ch.call(localTimeout(remoteTimeout(new Callable<...>(...) {...})));

but it is fine to keep the local timeout as a parameter

ch.call(withTimeout(new Callable<...>(...){...}), localTimeout);

as that way the withTimeout static callable decorator can also be reused locally without confusion

stephenc avatar Jul 03 '17 09:07 stephenc

provide a wrapper to the callable for the caller timeout too

That is essentially what I attempted to do with the Timeout utility in Pipeline code, but it is not a good solution here—hence my RFE (which is overshot by this PR). More: https://github.com/jenkinsci/remoting/pull/174#discussion_r124925887

Do you block the PR due to that?

Well, -0.5. I think this PR is solving problems it did not need to solve, and solving the problems it did need to solve the wrong way.

To reiterate, my original complaint in JENKINS-44785 was that, say,

FilePath f;
try {
    if (f.isDirectory()) {
        // …
    }
} catch (IOException | InterruptedException x) {
    // …
}

could block indefinitely due to problems in the network layer, being an example of one of the fundamental fallacies of distributed computing; and my request was for some variant like

FilePath f;
try {
    if (f.isDirectoryInterruptible()) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

or

FilePath f;
try {
    if (f.isDirectory(30, TimeUnit.SECONDS)) {
        // …
    }
} catch (IOException | InterruptedException | TimeoutException x) {
    // …
}

which would make explicit the fact that the network operation might hang for various arbitrary reasons and the caller must be prepared to deal with it.

jglick avatar Jul 03 '17 21:07 jglick

close?

timja avatar Sep 24 '21 12:09 timja

Maybe, so long as stale-pr is added to the corresponding issue.

jglick avatar Sep 24 '21 13:09 jglick

I've been leaving it around as a reference for some possible ideas, thought the actual value is small.

jeffret-b avatar Sep 24 '21 13:09 jeffret-b