vscode icon indicating copy to clipboard operation
vscode copied to clipboard

Use Worker to serialize Notebook

Open nojaf opened this issue 1 year ago • 4 comments

Related to #214040

Hi @amunger, I tried running the JSON.stringify call on a Worker thread. It works on my machine and seems to improve the situation, but it's hard to say if the issue is completely resolved. Any thoughts?

nojaf avatar Aug 26 '24 15:08 nojaf

@microsoft-github-policy-service agree company=G-Research

nojaf avatar Aug 27 '24 06:08 nojaf

nice! I should be able to give this a try at the end of the week. We would also want to put this behind a setting so we can roll it out with an experiment if it works as expected.

amunger avatar Aug 27 '24 16:08 amunger

@amunger, I'm not entirely sure about the risks of using a worker for this in Node or how stable it is. It likely depends on whether this practice is already being used elsewhere in the code. If you feel a setting is necessary, I’m fine with it. However, it might be safe enough to proceed without one.

nojaf avatar Aug 27 '24 16:08 nojaf

@amunger, I've added a new setting. If you have any suggestions for a better name or description, just let me know. All set.

nojaf avatar Aug 28 '24 09:08 nojaf

here's a sample of the perf diff on a 400mb notebook - it reduces the time that the EH is locked up by ~24%, that big block goes from 2.93s -> 2.23s. image

It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward, and createJupyterCellFromNotebookCell would be trickier since it's using types from vscode.

If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension Otherwise, we need to ensure that we only use this in desktop, since a web based EH doesn't have access to node.

I'm fine with merging in a change that doesn't do all the possible optimizations to get this going, but we at least need to take care of the node dependency first. I can try to look into doing that next week if you don't get around to it.

amunger avatar Aug 30 '24 17:08 amunger

also, make sure to get the latest since we just had a big change go in to switch over to ESM imports

amunger avatar Aug 30 '24 17:08 amunger

Thanks a bunch for this feedback. Will circle back to this on Monday. Enjoy the weekend, cheers!

nojaf avatar Aug 30 '24 17:08 nojaf

Hi @amunger

If possible, we should also follow the pattern of Worker usage from the other built-in extensions and avoid the node imports e.g. the css extension

I'm a little unsure how to proceed there, the client/server thing seems like overkill compared to the Worker thread. I'm checking for Node right now and using dynamic imports. Let me know if that works for you or if you would prefer the client/server approach.

It seems like more of the work could be pushed to the worker to reduce this further, sortObjectPropertiesRecursively would be pretty straight forward,

I actually tried that but got:

Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack: - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\serializers.js - c:\Users\nojaf\Projects\vscode\extensions\ipynb\out\notebookSerializerWorker.js

the JavaScript is something like

"use strict";
/*---------------------------------------------------------------------------------------------
 *  Copyright (c) Microsoft Corporation. All rights reserved.
 *  Licensed under the MIT License. See License.txt in the project root for license information.
 *--------------------------------------------------------------------------------------------*/
Object.defineProperty(exports, "__esModule", { value: true });
const node_worker_threads_1 = require("node:worker_threads");
const serializers_1 = require("./serializers");
if (node_worker_threads_1.parentPort) {
    const { notebookContent, indentAmount } = node_worker_threads_1.workerData;
    const json = JSON.stringify((0, serializers_1.sortObjectPropertiesRecursively)(notebookContent), undefined, indentAmount) + '\n';
    node_worker_threads_1.parentPort.postMessage(json);
}
//# sourceMappingURL=notebookSerializerWorker.js.map

and I think the require("./serializers") loads the entire file which somewhere contains vscode which is not known in the worker perhaps?

nojaf avatar Sep 02 '24 10:09 nojaf

Failed to save 'smaller.ipynb': Cannot find module 'vscode' Require stack

Yeah that's what I would expect - the imports get pretty deep into the vscode API for that function, so we can just leave that out for now.

Limiting to node is fine for now, we'll just have to see if the CI tests have any issues with importing like that.

there are some conflicting files to resolve at this point though.

amunger avatar Sep 03 '24 17:09 amunger

Alright, I believe this is ready for review @amunger.

nojaf avatar Sep 04 '24 07:09 nojaf

Thanks for the reviews @amunger and @rebornix!

nojaf avatar Sep 05 '24 08:09 nojaf

@amunger @rebornix , please can you review this.

DonJayamanne avatar Sep 09 '24 14:09 DonJayamanne

Hello @rebornix, would you mind reviewing this PR as well? Is this good to go? Let me know if this needs any more tweaks, happy to address these.

nojaf avatar Sep 11 '24 09:09 nojaf

It looks good to me. Before we go ahead and merge, @DonJayamanne did you get a chance to run a build on CI so we can verify if it works as expected?

rebornix avatar Sep 11 '24 18:09 rebornix

did you get a chance to run a build on CI so we can verify if it works as expected?

Yes & it works as expected

DonJayamanne avatar Sep 11 '24 20:09 DonJayamanne