connect-es icon indicating copy to clipboard operation
connect-es copied to clipboard

Returning or throwing from a method always triggers an abort

Open lourd opened this issue 1 year ago • 5 comments

Describe the bug

Normally returning or throwing from an RPC method triggers the abort event on the given context's AbortSignal. That's very unexpected. I'm not sure if that's intentional or a bug. Hopefully it's not intended, I don't think it should abort every time no matter what. As it stands now you have to remember to remove every abort listener you may have added to avoid accidental downstream aborting, which is really tedious.

To Reproduce

Environment (please complete the following information):

  • @connectrpc/connect-web version: n/a
  • @connectrpc/connect-node version: 1.4.0
  • Frontend framework and version: n/a
  • Node.js version: 20.12.1
  • Browser and version: n/a
// test_service.proto

syntax = "proto3";

package repro.v1;

message FooRequest {
  string name = 1;
}

message FooResponse {}

service TestService {
  rpc Foo(FooRequest) returns (FooResponse);
}

// main.js
import {
  createPromiseClient,
  createRouterTransport,
} from "@connectrpc/connect";

import { TestService } from "./gen/repro/v1/test_service_connect.js";
import { FooRequest, FooResponse } from "./gen/repro/v1/test_service_pb.js";

const transport = createRouterTransport((router) => {
  router.service(TestService, {
    async foo(req, ctx) {
      ctx.signal.addEventListener("abort", () => {
        console.log("ABORTED");
      });
      if (req.name === "throw") {
        throw new Error("thrown!");
      }
      return new FooResponse();
    },
  });
});

const client = createPromiseClient(TestService, transport);
await client.foo(new FooRequest({ name: "test" }));
try {
  await client.foo(new FooRequest({ name: "throw" }));
} catch {}

This will end up printing ABORTED twice, once for the first request when it returns normally and once for the second when it throws

lourd avatar Jun 26 '24 06:06 lourd

Seeing the call to abort here https://github.com/connectrpc/connect-es/blob/1d71dc9042572687ed28f7c05928f6fb96ade5aa/packages/connect/src/protocol-connect/handler-factory.ts#L271-L273 added by @timostamm in https://github.com/connectrpc/connect-es/pull/632

lourd avatar Jun 26 '24 07:06 lourd

Hey! This was intentional, but you can use the reason property on signal to differentiate whether it was a normal abort or not.

srikrsna-buf avatar Jun 26 '24 11:06 srikrsna-buf

Are you open to changing that? That's really confusing and seems like a big pitfall. The case that caused this error for me isn't even aware that it's in a Connect service, it was a couple layers deeper than that and all the sudden is being called after having migrated to Connect from a REST framework. The reason property is also any typed, so comparing that against anything is more of a wish than a guarantee of correctness.

This behavior also isn't documented anywhere. Thinking about how that would be communicated kind of shows how the inherent issue - "The abort signal will be aborted regardless of what happens in your code. It's up to you to check the reason if it was actually aborted or not". Really begs the question of why the library code calling abort() in the first place couldn't do the same thing and not trigger the abort event.

Reading the MDN docs on reason, I really don't think this is proper use. If you don't give a reason when calling abort(), the reason is automatically set to a DOMException. So we would have to check that the reason isn't a DOM exception? This is the behavior in the browser and Node.js

Screenshot 2024-06-26 at 9 17 48 AM

Screenshot 2024-06-26 at 9 58 58 AM

Can you please change this behavior? It appears that the motivation was simply to clear the setTimeout in the deadlineSignal.

https://github.com/connectrpc/connect-es/blob/1d71dc9042572687ed28f7c05928f6fb96ade5aa/packages/connect/src/protocol/signals.ts#L85

Can we change it back to doing that separately? Perhaps making a cleanup method on context that maps to calling deadline.cleanup, and only call that every time instead of abort?

lourd avatar Jun 26 '24 13:06 lourd

The internal cleanup is not the reason for doing this. The signal can be passed to other services and is guaranteed to abort/complete when the rpc is complete (with or without an error). This makes it convenient to pass this signal around to other services. This is particularly useful in fanout scenarios.

Changing this now will certainly be a breaking change, as it changes the behavior. So we can revisit this in v2 (being worked on now). For now, I'd suggest relying on the using keyword added in TS 5.2 (tc39 Stage 3):

const transport = createRouterTransport((router) => {
  router.service(TestService, {
    async foo(req, ctx) {
      using _ = onAbort(signal, () => {
        console.log("ABORTED");
      });    
      if (req.name === "throw") {
        throw new Error("thrown!");
      }
      return new FooResponse();
    },
  });
});

function onAbort(signal: AbortSignal, onAbort: NonNullable<AbortSignal["onabort"]>): Disposable {
  signal.addEventListener("abort", onAbort)
  return {
    [Symbol.dispose]: () =>
      signal.removeEventListener("abort", onAbort)
  }
}

srikrsna-buf avatar Jun 27 '24 16:06 srikrsna-buf

Thanks for the thorough response @srikrsna-buf, I appreciate it. That's a good point about the fanout case, having a completion signal that can be used to cancel in process work when one of the workers has finished responding.

Sounds like there's two different cases this one signal is supporting then - aborting and completing. I think it'd be a good idea to separate those into two separate signals. Like I've said, having the abort signal be responsible for both abortion and completion is a pitfall for any code that only expects to be called when things are actually aborted.

The fan-out case can also be enabled by the user making their own abort controller/completion signal and calling that in their own finally, and linking that to the given abort signal. I imagine that's what most people would do these days since there's no documentation about the given abort signal also being called on completion every time. I think that'd be less complicated than example code you've given about using new TS/JS features to remember to remove an abort event listener.

lourd avatar Jun 27 '24 17:06 lourd

Louis, you're right that HandlerContext.signal takes on two different concerns with cancellation and completion.

Now that AbortSignal.any() has been added to the standard, it has become a bit more intuitive to link signals, and we're going to revisit the behavior for v2.

timostamm avatar Sep 09 '24 12:09 timostamm

Sounds good, thanks for letting me know @timostamm. What's the plan for v2? Are changes going into main right now on a v1 track or v2?

lourd avatar Sep 09 '24 13:09 lourd

V2 is mainly for compatibility with protobuf-es v2, but we can make other sensible breaking changes. It lives in the branch v2, and is available as a pre-release.

Once v2 ships, main will be renamed to v1, and v2 will be become main.

timostamm avatar Sep 09 '24 15:09 timostamm

To split up the concerns, we could add a separate signal to the handler for completion, and/or a separate combined signal (basically renaming the current HandlerContext.signal).

For example, with a separate done signal:

router.service(ElizaService, {
  async *introduce(req: IntroduceRequest, ctx: HandlerContext) {
    const cancelledOrHandlerDone = AbortSignal.any([ctx.signal, ctx.done]);
    longRunning(cancelledOrHandlerDone);
    yield { sentence: `Hi ${req.name}, I'm eliza` };
  },
});

It would also be possible to add a promise, for example settled:

router.service(ElizaService, {
  async *introduce(req: IntroduceRequest, ctx: HandlerContext) {
    const controller = new AbortController();
    void ctx.settled.then(() => controller.abort(), (reason) => controller.abort(reason));
    longRunning(controller.signal);
    yield { sentence: `Hi ${req.name}, I'm eliza` };
  },
});

But IMO it's most readable to let users use try...finally with their own AbortController:

router.service(ElizaService, {
  async *introduce(req: IntroduceRequest, ctx: HandlerContext) {
    const done = new AbortController();
    try {
      const cancelledOrHandlerDone = AbortSignal.any([ctx.signal, done.signal]);
      longRunning(cancelledOrHandlerDone);
      yield { sentence: `Hi ${req.name}, I'm eliza` };
    } finally {
      done.abort();
    }
  },
});

@srikrsna-buf, what do you think?

timostamm avatar Sep 16 '24 09:09 timostamm

I like the third option, changing the signal to only abort on an error and letting the user deal with completion looks readable to me too.

srikrsna-buf avatar Sep 16 '24 09:09 srikrsna-buf

Re-visiting this for https://github.com/connectrpc/connect-es/issues/1253:

My concern is that without a mechanism for detecting doneness in all cases, resource cleanup will be very hard for developers, and they'd have to make their own try/catch wrappers...

A valid concern. @lourd, you originally wrote:

As it stands now you have to remember to remove every abort listener you may have added to avoid accidental downstream aborting, which is really tedious.

Can you clarify in which situation you wouldn't want to abort a downstream operation if the RPC is complete?

timostamm avatar Oct 17 '24 12:10 timostamm

My use case is that the RPC starts up a heavy weight long-running process on the machine. While the process is starting, if the request is aborted, the process should be killed. But once the process has started and the response is sent, it should not be aborted when the RPC completes.

FWIW it was easy enough to implement that behavior, unsubscribing my abort listener before returning, once I understood that ctx.signal will always be aborted.

If you make the call to only have one signal I would request that the name and documentation reflects that behavior clearly, and there is still some way to differentiate whether a request was aborted or simply completed. Right now I'm able to do that in my logger interceptor by checking ctx.signal.aborted in the catch/finally blocks of calling next(request), before returning. I'm assuming that would still be possible in v2?

lourd avatar Oct 17 '24 15:10 lourd

I don't understand the assertion in #1253

AFAICT, this implies that at present the server has no universal built-in way to detect when an RPC completes, regardless of reason. Thus resource cleanup cannot occur.

It seems to me like using a finally block like you posted in your earlier possible approaches is a universal built-in way to detect when the RPC completes?

But... I suppose the case that it doesn't is interceptors responding early? https://github.com/grpc/grpc-node/issues/2771#issuecomment-2148637924

can also happen if a server interceptor ends a call on its own

lourd avatar Oct 17 '24 15:10 lourd

Thanks for the input. I wish there was a variant of AbortSignal that isn't tied to cancellation. We're reverting to include completion in the signal in https://github.com/connectrpc/connect-es/pull/1282.

You can still exclude completion:

async say(req: SayRequest, ctx) {
  const handle = () => {
    const err = ConnectError.from(ctx.signal.reason);
    // DeadLineExceeded if the deadline passed.
    // Canceled if the client closed the stream (H2).
    err.code; 
  };
  ctx.signal.addEventListener("abort", handle);
  try {
    // ...
    return {
      sentence: "hi",
    };
  } finally {
    ctx.signal.removeEventListener("abort", handle);
  }
}

timostamm avatar Oct 18 '24 11:10 timostamm