workerd
workerd copied to clipboard
🐛 Bug Report — Logging broken in hibernatable WebSocket event handlers
We have one DurableObject that uses the hibernation API. It implements webSocketMessage() in which we use console.log() to log stuff.
On miniflare (wrangler dev) that works perfectly fine, the logs appear in the terminal, as they should.
When deployed to Cloudflare though, nothing is logged from within the webSocketMessage() handler. The handler method with the log statement is executed and works fine.
This happens both with wrangler tail, as well as our log push job to an R2 bucket. All other log calls work fine.
I added a workaround where for that DO we additionally send every log message to another DO and log it there. That works.
I observe identical behavior
@MellowYarker @jasnell @kentonv Sorry to ping you this way but could someone please triage this?
It's not a blocker for us since we have the work-around, and I can't look at it again for a few weeks anyways, but now that someone else ran into it maybe it's worth taking a look?
Huh, not sure of anything that would have caused this to break. Was it always broken or did you notice a regression? Will take a look next week.
We only started using hibernatable sockets a few months ago, so I can't tell if it was broken from the beginning.
Haven't forgotten about this, just been a bit hectic this week! Still on my todo list.
@codefrau sorry for the delay. I just tested this and I seems to work for me.
My steps were:
npm create cloudflare@latest- Pasted the hibernation example
- Added a
console.log()to thewebSocketMessage()handler, so now it looks like:
async webSocketMessage(ws, message) {
console.log(`RECEIVED NEW MESSAGE: ${message}`)
// Upon receiving a message from the client, reply with the same message,
// but will prefix the message with "[Durable Object]: " and return the
// total number of connections.
ws.send(`[Durable Object] message: ${message}, connections: ${this.ctx.getWebSockets().length}`);
}
npx wrangler deployafter setting up the binding inwrangler.toml- Ran
npx wrangler tailto get logs - Created a client connection from my browser that did
ws.send("Hello from the browser");, and then saw the following in my console:
Successfully created tail, expires at 2024-06-21T19:08:10Z
Connected to logging, waiting for logs...
Unknown Event - Ok @ 6/21/2024, 9:10:02 AM
(log) RECEIVED NEW MESSAGE: Hello from the browser
Is there any way you could help narrow the behavior down?
Hi @MellowYarker - thank you for getting back to this. I'm keeping the engines running here while @codefrau is out of the office.
Using your basic example, I initially saw the same response as you. After a while I went back to our application and re-deployed to test out one theory - which proved unhelpful... but it did reveal that the basic logging was initially working, but stopped working after a few minutes.
To test this, I added
console.log(`REQUEST: ${request.url}`);
to the fetch method of your sample DO, and set up the following code in a web page accessing it:
const sock = new WebSocket("wss://ael-expt2.synq.workers.dev/websocket");
sock.onmessage = (event) => console.log(event.data);
setInterval(() => {
fetch("https://ael-expt2.synq.workers.dev/", { mode: "no-cors" }); // should trigger a GET log
sock.send(new Date().toISOString()); // should trigger an Unknown Event log
}, 20000);
Below is a screenshot of the npx wrangler tail next to the console of the browser. As you can see, for two minutes the logs show both the fetch and the socket message, but after that the socket-message-triggered logging disappears (and seems never to resume).
@aranlunzer thanks for looking into this some more! I ran the same experiment as you and saw logs from fetch and the websocket for an hour, after which wrangler tail seems to have silently quit.
Which browser are you using? Also, would you mind adding a log to the webSocketClose() handler as well?
@MellowYarker After some further hours of experiments, my main finding is that the results are wildly inconsistent, with no readily discernable pattern.
To maximise your chances of reproduction, here is what I am using.
wrangler.toml
#:schema node_modules/wrangler/config-schema.json
name = "ael-expt2"
main = "src/index.ts"
logpush = true
compatibility_date = "2024-06-20"
compatibility_flags = ["nodejs_compat"]
[[durable_objects.bindings]]
name = "WEBSOCKET_HIBERNATION_SERVER"
class_name = "WebSocketHibernationServer"
[[migrations]]
tag = "v1"
new_classes = ["WebSocketHibernationServer"]
index.ts (note removal of async on the websocket functions, which appears to have made no difference)
import { DurableObject } from "cloudflare:workers";
export interface Env {
WEBSOCKET_HIBERNATION_SERVER: DurableObjectNamespace<WebSocketHibernationServer>;
}
// Worker
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
console.log(`REQUEST: ${request.url}`);
if (request.url.endsWith("/websocket")) {
const upgradeHeader = request.headers.get('Upgrade');
if (!upgradeHeader || upgradeHeader !== 'websocket') {
return new Response('Durable Object expected Upgrade: websocket', { status: 426 });
}
let id = env.WEBSOCKET_HIBERNATION_SERVER.idFromName("foo");
let stub = env.WEBSOCKET_HIBERNATION_SERVER.get(id);
return stub.fetch(request);
}
return new Response("hello");
}
};
export class WebSocketHibernationServer extends DurableObject {
constructor(state: DurableObjectState, env: Env) {
super(state, env);
console.log("woke up");
}
async fetch(request: Request): Promise<Response> {
const webSocketPair = new WebSocketPair();
const [client, server] = Object.values(webSocketPair);
this.ctx.acceptWebSocket(server);
return new Response(null, {
status: 101,
webSocket: client,
});
}
// async
webSocketMessage(ws: WebSocket, message: ArrayBuffer | string) {
console.log(`RECEIVED NEW MESSAGE: ${message}`);
ws.send(`[Durable Object] message: ${message}, connections: ${this.ctx.getWebSockets().length}`);
}
// async
webSocketClose(ws: WebSocket, code: number, reason: string, wasClean: boolean) {
console.log(`CLOSE: ${code} ${reason} wasClean: ${wasClean}`);
ws.close(code, "Durable Object is closing WebSocket");
}
}
index.html
<html>
<body>
<script type="text/javascript">
try {
const sock = new WebSocket("wss://ael-expt2.synq.workers.dev/websocket")
sock.onmessage = event => console.log(event.data);
setInterval(() => {
fetch("https://ael-expt2.synq.workers.dev/", { mode: "no-cors" }); // should trigger a GET log
sock.send(new Date().toISOString()); // should trigger an Unknown Event log
}, 10000);
} catch(e) {
console.log(e)
}
</script>
</body>
</html>
This evening I have re-deployed around 50 times. Sequence:
npx wrangler deploy- on the Cloudflare page for this worker, select Logs tab and ask to begin streaming (accepting the need to restart DOs)
- start a new tab with
index.html, accessing the worker - record how long we continue to see the expected two log entries on each request (every 10s) before it drops to a single log each time, because the websocket logging is now failing
Once the change is observed (or after around 5 minutes it it's still going)
- close the browser tab
- suspend log stream
- switch away from Logs tab (so streaming will have to be begun from scratch next time)
- re-deploy and repeat
The time taken for logging from the websocket request to start failing is extremely variable. Sometimes after just 20s; most often somewhere in the range 100-200s; occasionally it's still going even after 300s (so I reset and try again).
[edit] Forgot to mention: I'm using Chrome to load the web page, and make sure to keep the tab visible so the timers continue to work.