comlink icon indicating copy to clipboard operation
comlink copied to clipboard

intermittent callback/channel issues in Safari 16

Open twmillett opened this issue 1 year ago • 10 comments

I apologize upfront for the somewhat low quality of this bug report. It is a very hard bug to reproduce outside of our particular app (which is a big, heavy, app). And, even then, it's non-deterministic, so a reduced repro isn't going to be of much value. The bug isn't in comlink, but I think there is a potential mitigation within comlink. Also, having this issue documented may help others.

On Safari 16.0, the basic callback pattern described here will sometimes silently stop working.

https://github.com/GoogleChromeLabs/comlink/tree/main/docs/examples/02-callback-example

Meaning, the callback may work for awhile, but at some point the callback function stops being called on the Main thread. This happens for callbacks with and without parameters passed in. Once it stops working, it never starts working again.

I traced into comlink and found that the postMessage from the Worker is successfully executed, but the addEventListener('message', ...) associated with the MessageChannel callback port is never invoked on the Main thread. There's no messageerror callback on main, either.

I believe what's happening is that Safari is inappropriately garbage collecting the MessageChannel under certain conditions. Things like memory pressure appear to play a role. Retaining the Endpoint created by the proxyTransferHandler and subscribed to in expose by pushing it into a module scoped data structure thus far appears to completely mitigate the problem for us.

It looks like Safari has had some issue in this space reported before: https://bugs.webkit.org/show_bug.cgi?id=193184 https://bugs.webkit.org/show_bug.cgi?id=184502

twmillett avatar Oct 26 '22 18:10 twmillett

@twmillett, do you have a somewhat reduced test case you could share?

youennf avatar Oct 28 '22 07:10 youennf

@twmillett, do you have a somewhat reduced test case you could share?

Unfortunately, a reduced repro is very unlikely to display the issue. I added you to a private github repo with a somewhat reliable repro.

https://github.com/twmillett/safari-messagechannel-repro/issues/1

@youennf , please let me know if there's more I can do to help.

twmillett avatar Oct 28 '22 18:10 twmillett

@twmillett we are having the exact same issue in our application and I'd like to understand more about the mitigation that you have employed. We ended up using a straight postMessage/onMessage communication pattern to replace our callbacks that stopped working. We suspected we needed to strengthen the referential integrity of something but failed to identify the right object to operate with (tried the callback and the proxy itself).

Is the workaround you used to replace the comlink internal proxyTransferHandler in the transferhandlers collection with one that can store the "port1" reference created by MessageChannel()? Thanks in advance for any advice you can give.

dmarini avatar Nov 03 '22 13:11 dmarini

@dmarini, if you're willing to fork comlink, the simplest workaround in comlink.ts is to create a Map of endpoints and simulate an "addRef" on the endpoint in expose and "releaseRef" on the endpoint when it's released. Something like

    Promise.resolve(returnValue)
      .catch((value) => {
        return { value, [throwMarker]: 0 };
      })
      .then((returnValue) => {
        const [wireValue, transferables] = toWireValue(returnValue);
        ep.postMessage({ ...wireValue, id }, transferables);
        if (type === MessageType.RELEASE) {
          // detach and deactive after sending release response above.
          ep.removeEventListener("message", callback as any);
          closeEndPoint(ep);

          // releaseRef on the port
          const refs = (retainedPorts.get(ep) || 0) - 1;
          retainedPorts.set(ep, refs);
          if (refs < 1) {
            retainedPorts.delete(ep);
          }
        }
      });
  } as any);


  if (ep.start) {
    ep.start();
  }

  // addref on the port
  const refs = (retainedPorts.get(ep) || 0) + 1;
  retainedPorts.set(ep, refs);

You can also patch the MessagePort prototype to hook addEventListener/removeEventListener to get something similar, something like:

let isPatched = false;
const portListeners: WeakMap<MessagePort, Set<EventListenerOrEventListenerObject>> = new WeakMap();
const retainedPorts: Map<MessagePort, Set<EventListenerOrEventListenerObject>> = new Map();

export function patchComlink() {
    if (!isPatched) {
        isPatched = true;
        const pAddEventListener = MessagePort.prototype.addEventListener;
        const pRemoveEventListener = MessagePort.prototype.removeEventListener;

        MessagePort.prototype.addEventListener = function (
            type: string,
            listener: EventListenerOrEventListenerObject,
            options?: boolean | AddEventListenerOptions
        ) {
            if (type === 'message') {
                const listeners = getListeners(this);
                listeners.add(listener);
            }

            pAddEventListener.call(this, type, listener, options);
        };
        MessagePort.prototype.removeEventListener = function (
            type: string,
            listener: EventListenerOrEventListenerObject,
            options?: boolean | AddEventListenerOptions
        ) {
            if (type === 'message') {
                const listeners = getListeners(this);
                listeners.delete(listener);
                if (listeners.size === 0) {
                    retainedPorts.delete(this);
                }
            }

            pRemoveEventListener.call(this, type, listener, options);
        };
    }
}

function getListeners(port: MessagePort): Set<EventListenerOrEventListenerObject> {
    let rv = retainedPorts.get(port);
    if (!rv) {
        rv = portListeners.get(port);
        if (!rv) {
            rv = new Set();
            portListeners.set(port, rv);
        }
        retainedPorts.set(port, rv);
    }

    return rv;
}

twmillett avatar Nov 03 '22 14:11 twmillett

@twmillett this is great thank you, I'll look into this if our current fix candidate continues to demonstrate the issue!

dmarini avatar Nov 03 '22 19:11 dmarini

https://github.com/WebKit/WebKit/pull/6099

A fix seems to have been merged on November 3rd.

ivancuric avatar Jan 18 '23 12:01 ivancuric

Can anyone confirm that this was fixed in recent Safari / Safari TP builds?

benjamind avatar Jan 24 '23 22:01 benjamind

i dunno if it's relevant but i found a crazy workaround where i console.log'd the "Worker" instance before using. i was amazed to see this fix it. just pontificating but could maybe be explained by #6099 if the console.log prevented gc (this was a couple months ago not with comlink, can't confirm on latest currently. also affected things like gnome web which is my safari-alike on linux)

cmdcolin avatar Mar 20 '23 07:03 cmdcolin

@twmillett thank you for the example of the fix. Could you please explain why do you use WeakMap in the second example?

quolpr avatar Mar 21 '23 13:03 quolpr

@twmillett thank you for the example of the fix. Could you please explain why do you use WeakMap in the second example?

It's an optimization to avoid having to (potentially) create more than one Set object for the same MessagePort instance's lifetime:

Suppose a MessagePort has one listener and, in a block of code, that listener is removed and another one is added.

If we just maintained a mapping from the port to its listeners, we'd see its listener count drop to zero and remove the port->listeners entry from retainedPorts. When the next listener is added, we'd end up creating a new entry with a new listeners Set.

By using a separate WeakMap mapping, we'd be able to reuse the original listener Set as long as the WeakMap retains the MessagePort, which should be as long as any code has a reference to that port (and, code can't add a listener if it doesn't have such a reference).

An alternative, maybe, would be to stamp each MessagePort's listener set on port's instance, itself.

In any case, you could get rid of it and just recreate listener Set each time the listener count goes from 0 to 1.

twmillett avatar Mar 21 '23 19:03 twmillett