Use Worker to serialize Notebook
Related to #214040
Hi @amunger, I tried running the JSON.stringify call on a Worker thread. It works on my machine and seems to improve the situation, but it's hard to say if the issue is completely resolved. Any thoughts?
@microsoft-github-policy-service agree company=G-Research
nice! I should be able to give this a try at the end of the week. We would also want to put this behind a setting so we can roll it out with an experiment if it works as expected.
@amunger, I'm not entirely sure about the risks of using a worker for this in Node or how stable it is. It likely depends on whether this practice is already being used elsewhere in the code. If you feel a setting is necessary, I’m fine with it. However, it might be safe enough to proceed without one.
@amunger, I've added a new setting. If you have any suggestions for a better name or description, just let me know. All set.
here's a sample of the perf diff on a 400mb notebook - it reduces the time that the EH is locked up by ~24%, that big block goes from 2.93s -> 2.23s.
It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward, and createJupyterCellFromNotebookCell would be trickier since it's using types from vscode.
If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension Otherwise, we need to ensure that we only use this in desktop, since a web based EH doesn't have access to node.
I'm fine with merging in a change that doesn't do all the possible optimizations to get this going, but we at least need to take care of the node dependency first. I can try to look into doing that next week if you don't get around to it.
also, make sure to get the latest since we just had a big change go in to switch over to ESM imports
Thanks a bunch for this feedback. Will circle back to this on Monday. Enjoy the weekend, cheers!
Hi @amunger
If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension
I'm a little unsure how to proceed there, the client/server thing seems like overkill compared to the Worker thread. I'm checking for Node right now and using dynamic imports. Let me know if that works for you or if you would prefer the client/server approach.
It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward,
I actually tried that but got:
Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack: - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\serializers.js - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\notebookSerializerWorker.js
the JavaScript is something like
"use strict";
/*---------------------------------------------------------------------------------------------
* Copyright (c) Microsoft Corporation. All rights reserved.
* Licensed under the MIT License. See License.txt in the project root for license information.
*--------------------------------------------------------------------------------------------*/
Object.defineProperty(exports, "__esModule", { value: true });
const node_worker_threads_1 = require("node:worker_threads");
const serializers_1 = require("./serializers");
if (node_worker_threads_1.parentPort) {
const { notebookContent, indentAmount } = node_worker_threads_1.workerData;
const json = JSON.stringify((0, serializers_1.sortObjectPropertiesRecursively)(notebookContent), undefined, indentAmount) + '\n';
node_worker_threads_1.parentPort.postMessage(json);
}
//# sourceMappingURL=notebookSerializerWorker.js.map
and I think the require("./serializers") loads the entire file which somewhere contains vscode which is not known in the worker perhaps?
Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack
Yeah that's what I would expect - the imports get pretty deep into the vscode API for that function, so we can just leave that out for now.
Limiting to node is fine for now, we'll just have to see if the CI tests have any issues with importing like that.
there are some conflicting files to resolve at this point though.
Alright, I believe this is ready for review @amunger.
Thanks for the reviews @amunger and @rebornix!
@amunger @rebornix , please can you review this.
Hello @rebornix, would you mind reviewing this PR as well? Is this good to go? Let me know if this needs any more tweaks, happy to address these.
It looks good to me. Before we go ahead and merge, @DonJayamanne did you get a chance to run a build on CI so we can verify if it works as expected?
did you get a chance to run a build on CI so we can verify if it works as expected?
Yes & it works as expected