threads icon indicating copy to clipboard operation
threads copied to clipboard

Can we avoid data segment fill at every instantiation?

Open juj opened this issue 8 years ago • 43 comments

Reading https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#initializing-memory-only-once, with

The data segments are always copied into linear memory, even if the same module 
is instantiated again in another agent. One way to ensure that linear memory is only 
initialized once is to place all data segments in a separate module that is only 
instantiated once, then share the linear memory with other modules.

if I understand this correctly, this will prevent Emscripten and other compilers from being able to place data and code sections in the same .wasm file, but must always use two separate files for them, similar to how .mem files are being used in asm.js.

It sounds like it would be easy to add a Module.instantiate(..., noDataSegment: true); type of machinery to avoid this? Given that there's an explicit paragraph about this in the document, I presume this approach has been considered already before, and was turned down? I'm curious, what was the conversation about it back then?

juj avatar Aug 25 '17 12:08 juj

It's good of you to bring that up again. The previous discussion started here: https://github.com/WebAssembly/threads/issues/16#issuecomment-304600033. There was IMO no consensus around the matter, but the matter was closed (by @binji) for reasons that are not clear to me.

lars-t-hansen avatar Aug 25 '17 14:08 lars-t-hansen

The drive to have one .wasm file that contains both the code and the data sections is commonly driven by requests from the community, i.e. some Emscripten users prefer to have as few files in the output as possible, to the point that even embedding the .wasm into a .html file or a .js file has been asked.

Technically we can definitely make do with two files, one .wasm with a .mem file for data section, which I think we do already support for wasm as well, though I'm sure people will ask if they can optimize away the .mem file when it suddenly becomes two files again. It can be a bit easier to manage caching if there is only one file instead of two, and one has less risk of loading incompatible versions of the two.

Another technical note is that since we'll need to keep the Module in memory throughout the lifetime of the wasm application in order to be able to accommodate pthread_create of new threads, it will be more efficient to separate the two files, so the memory initializer/data segment can be dropped at startup after it has been applied, and only the wasm module with the code still kept around. (I'm sure that will not stop people from asking to have only one file instead of two)

Even with the this optimization aspect noted, if it's all the same, I think it would be useful to have a flag for this, especially considering if it is simple to implement in practice. CC @kripken

juj avatar Aug 25 '17 17:08 juj

I suggest designing the details of this feature and presenting this design to the CG.

jfbastien avatar Aug 25 '17 17:08 jfbastien

Sorry about that, I thought there was discussion in the CG meeting about this, but it appears there wasn't. I'm happy to write up a design for this unless you want to @juj.

binji avatar Aug 27 '17 19:08 binji

Thanks Ben, that would be much appreciated!

juj avatar Aug 28 '17 12:08 juj

Agreed it would be nice to solve this problem directly. (I like allowing single-.wasm outputs too.)

One question is whether it's sufficient to add a dynamic option to disable all segment initialization or whether it'll ever be the case that we'll want to run some, but not all. @juj ?

Separately, it seems like we have the options of: flags passed to instantiate et al or adding new init_expr fields to the module (in a backwards-compatible manner) which, if evaluated to zero, disable initialization. I think I prefer the latter given that we already use init_exprs to parameterize segment initialization.

Another technical note is that since we'll need to keep the Module in memory throughout the lifetime of the wasm application in order to be able to accommodate pthread_create of new threads, it will be more efficient to separate the two files, so the memory initializer/data segment can be dropped at startup after it has been applied, and only the wasm module with the code still kept around.

This varies by browser, I believe, but at least in FF, the bytecode is only kept alive by the Module object, not the Instance object, so assuming references are dropped to the Module after instantiation, the bytecode will be released at the next GC. (Machine code and runtime metadata are stored in a hidden third object, shared by a Module and all its Instances.) So this needn't be a technical reason to use two files.

lukewagner avatar Aug 28 '17 23:08 lukewagner

It's also worth considering per-thread init versus global init.

jfbastien avatar Aug 28 '17 23:08 jfbastien

Separately, it seems like we have the options of: flags passed to instantiate et al or adding new init_expr fields to the module (in a backwards-compatible manner) which, if evaluated to zero, disable initialization. I think I prefer the latter given that we already use init_exprs to parameterize segment initialization.

Interesting. It looks like we can add this by hijacking what we currently call the memory index, which fortunately must always be 0.

I like this idea but how would this work exactly? We want to make sure that it initializes memory the first time, but not subsequent times -- doesn't this fact need to be passed into the instantiate function somehow? Or were you thinking that it could query a global or something...?

binji avatar Aug 28 '17 23:08 binji

It's also worth considering per-thread init versus global init.

I think that's what this would allow ("global init" would mean you disable the segment after the first instantiation, "per-thread init" means you wouldn't disable, just pass in different offsets); is there something else you had in mind?

Interesting. It looks like we can add this by hijacking what we currently call the memory index, which fortunately must always be 0.

Haha, I had exactly the same devious thought :) We could rename it to "flags" and then "has non-default table/memory index" would just be another flag at some point in the future.

I like this idea but how would this work exactly? We want to make sure that it initializes memory the first time, but not subsequent times -- doesn't this fact need to be passed into the instantiate function somehow? Or were you thinking that it could query a global or something...?

The init_expr would be a get_global and so ultimately this would be controlled at each thread's instantiation site via the value of the imported global value. So the first thread would pass a 1 (to initialize) and subsequent threads would pass 0 (to not initialize).

lukewagner avatar Aug 29 '17 00:08 lukewagner

The init_expr would be a get_global and so ultimately this would be controlled at each thread's instantiation site via the value of the imported global value. So the first thread would pass a 1 (to initialize) and subsequent threads would pass 0 (to not initialize).

I see -- you're suggesting that we provide a second init_expr here. This is actually kinda neat, because then we can more precisely control data segment initialization. (Probably saying what everyone else already figured out, feeling a bit slow today ... :slightly_smiling_face:)

binji avatar Aug 29 '17 00:08 binji

I think that's what this would allow ("global init" would mean you disable the segment after the first instantiation, "per-thread init" means you wouldn't disable, just pass in different offsets); is there something else you had in mind?

Yeah, all I'm saying is I'd like to see per-thread init as a possible expansion in the proposed design. Doesn't need to be spec'd yet, but I want to know it's doable and what it looks like. One takeaway is that it's a property of a segment, not of the module or instance, whether initialization is global or per-thread.

jfbastien avatar Aug 29 '17 03:08 jfbastien

@jfbastien, maybe you can write more about what specifically you expect out of "per-thread init" and what the use cases look like, so that we can move this issue along.

The init-guard design ("Solution 2" in the doc) allows differing init behavior for different threads by manipulating globals referenced from the init-guard and the init-address expressions. In a situation (as in a browser) where we use non-shared-memory workers for threads it also allows multiple threads with different init behaviors to be initialized concurrently because the globals are private to the workers. Host code must set up the globals but this does not seem problematic.

The current design has per-segment init-guards and certainly allows host code to choose different segments, or place them at different addresses, on a per-instance basis, by manipulating the globals.

(It's hard for me to guess what future non-worker-based threads will look like; it seems likely that we'll not instantiate a module multiple times in that scenario, and that we need a very different type of initialization logic. The flags field of the init-guard design allows for the addition of more init-expressions, or indeed allows the init-guard to be interpreted differently.)

lars-t-hansen avatar Nov 02 '17 13:11 lars-t-hansen

The Nov meeting notes record this exchange:

BS: Additional proposal that came up last time about initialization - turning that off and on JF: Dynamic linking? BS: Being able to say turn off initialization on this thread - not really about dynamic linking BN: Is this something we want to poll now? Or wait? ??: Let’s put it in now and see if anyone objects

That is not hyper-informative, but separately Luke informs me that "binji agreed to add conditional init_exprs to data/elem segments". That sounds like Solution 2 to me. @binji, can you confirm that?

lars-t-hansen avatar Nov 07 '17 10:11 lars-t-hansen

Sorry, I should have gone through these issues and updated them with info from the CG.

Yes, the consensus as I understood it was to move forward with solution 2.

binji avatar Nov 07 '17 11:11 binji

Still not a big fan of this feature, since conditional initialisation seems too ad-hoc and doesn't give you anything you cannot already express with suitable modularization.

In my mind the underlying problem is that our design of data and element sections as "active" initialisers was wrong. What we should have done is make them "passive" data sources, and add explicit instructions that "apply" a segment, e.g. inside the start function -- or somewhere else. For example,

init_memory $data_seg_index
init_table $elem_seg_index

This would be much more flexible than what we have now. For example, besides conditional initialisation, it would allow initialising some time else than upon instantiation, it would allow reinitialising/resetting again later, it would allow applying a single segment many times -- especially if we also moved memory/table index and offset to those instructions.

Maybe it's not too late to extend the segment model in that direction?

rossberg avatar Nov 07 '17 11:11 rossberg

@rossberg, generalizing it further, it is basically a memcpy from a static source outside the heap, is it not?

If we're going down that road, a flag in the segment could mean "no address field present, never copy automatically".

lars-t-hansen avatar Nov 07 '17 12:11 lars-t-hansen

@lars-t-hansen, yes, at least in the case of memories (not sure if that accurately describes table segments, though). Repurposing the current zero byte as a flag distinguishing passive from active is what I was thinking.

rossberg avatar Nov 07 '17 12:11 rossberg

That (flag + apply operation) seems feasible.

jfbastien avatar Nov 07 '17 15:11 jfbastien

OK, I'll spend some time writing up this as an alternative soon.

binji avatar Nov 07 '17 16:11 binji

One thing to consider when designing an alternative: the useful thing about "active" initializers is that, at the end of module instantiation, the bytecode can be released; the instance doesn't have any residual dependencies. If initialization is a dynamic operation that can happen at any time, at least the data and elem sections would have to be kept around forever ultimately leading to two copies of the data.

To avoid this problem we could impose a dynamic requirement that init_memory/init_table can only execute in an instance while the instance is executing its start function (trapping otherwise); this would allow bytecode to be released when it is today.

lukewagner avatar Nov 07 '17 16:11 lukewagner

@lukewagner, by bytecode, do you mean the binary?

Nothing prevents you from defining a module that explicitly calls the start function again later, so I'm not sure such a restriction would help. And it would actually be a feature if you could reuse segments from other functions.

Would this necessarily mean that data sections have to be kept alive indefinitely? In the engine I would assume that you'd keep segments alive only by having references to them from functions that use respective instructions. Once they become garbage (like the start function could), so do the segments.

rossberg avatar Nov 07 '17 17:11 rossberg

@rossberg Yes, the binary. Also, my suggestion was to only allow init_memory/init_table during the instantiation phase that executed the start function, not literally any time the start function was active on the stack.

What you're suggesting about having functions keeping references to sections doesn't really work when the entire module is a single GC'able unit. At least in SM (I think others), we don't GC individual functions.

lukewagner avatar Nov 08 '17 06:11 lukewagner

@lukewagner, I see. Are you talking about a runtime error, then? For example. the init instructions would trap if executed later?

How difficult would it be for an implementation to make a special case for the start function? That is, find out whether the start function is referenced anywhere else (via export or call or elem), and if not, keep its code separate from the rest of the module so that it can be thrown away immediately?

rossberg avatar Nov 08 '17 10:11 rossberg

Yes, a runtime trap.

I think the problem is that we'd have a very implicit/silent cliff: probably we don't want to be using callgraph analysis (we don't build that graph for any other purpose) so this would be an optimization only if the init ops were literally inside the start function (and it wasn't exported, elem'd, etc). And even if we did build a callgraph, a simple indirect function call (maybe introduced by virtual function call) would throw it all off.

lukewagner avatar Nov 08 '17 18:11 lukewagner

We discussed this at the Nov 28 CG meeting. Some notes:

  • We agreed that allowing init_memory and init_table after instantiation is useful
  • We agreed that we don't want to require VMs to do global analysis to determine whether a segment is used after initialization
  • Some preference for @rossberg's proposal of having an additional bit on the segment to specify whether it is allowed to be used after instantiation

binji avatar Nov 28 '17 18:11 binji

Why not leave the current Data section and behaviors as they are, and make a new section for a runtime accessible source?

RyanLamansky avatar Nov 28 '17 18:11 RyanLamansky

@RyanLamansky That's an interesting point. In particular, when one has the ability to freely copy any dynamic [begin, end) subrange range into linear memory, there seems to be little need for segments.

lukewagner avatar Nov 29 '17 04:11 lukewagner

@lukewagner, multiple segments would still be needed to support merging of modules.

rossberg avatar Nov 29 '17 09:11 rossberg

@rossberg Ah, great point.

lukewagner avatar Nov 29 '17 17:11 lukewagner

@RyanLamansky I like this idea, it makes things much simpler.

Oh, I hadn't really thought about it, but do we want to add the init_table instruction before we have set_table (i.e. set element in the table)? Also, I assume we'll want a way to put an empty element in the segment. Or should that only be possible with a clear_table instruction?

binji avatar Nov 29 '17 19:11 binji