spec
spec copied to clipboard
Can the same sandbox instance be shared with the same extension (such as Filter) ?
Take API 'proxy_get_buffer' as an example:
Filter A(context ID:1) and Filter B(context ID:2)are XxxFilter extension instances, Filter A is handling OnHttpRequestHeader,Filter B is handling OnHttpResponseBody, Both Filter instances invoke 'proxy_get_buffer' method to get the HTTP header, because they didn't pass the contextId, On the host side, how do you know which HTTP request header to return correctly?
If the contextId is passed to the host, the host will correctly identify the Filter instance and return the HTTP header.
According to my understanding, the instance of each WASM module should be equivalent to the isolation sandbox, and sharing the same instance should save resources. If my understanding is incorrect, please correct me.
@zonghaishang the answer here is "it depends on the host implementation".
But take Envoy for example, of course, YES and Envoy creates Wasm VMs (each corresponding to plugin) per threads which makes it easy to identify different headers/body/trailers/metadata for each context created in Wasm VMs (here, Wasm VM is the sandboxed instance you mention).
Note that each context (corresponding to each request) is not "sandboxed" against each other since, as I said, multiple contexts are created in the per-thread VMs and they share underlying Wasm VM's resources including its linear memory.
If the contextId is passed to the host, the host will correctly identify the Filter instance and return the HTTP header.
And this is unnecessary because Wasm VMs are always executed by host implementation, and host should be able to know which context is currently executed regardless of their implementations.
@mathetake thank you for your reply. It sounds, because envoy is a single-threaded shared WASM VM instance, when the plug-in calls the host, the host can obtain the WASM VM instance Context according to the current thread.
If it is go runtime (sidecar mosn), thread storage is not supported, this seems to be problematic.
Filter A(context ID:1) and Filter B(context ID:2)are XxxFilter extension instances, Filter A is handling OnHttpRequestHeader,Filter B is handling OnHttpResponseBody, Both Filter instances invoke 'proxy_get_buffer' method to get the HTTP header, because they didn't pass the contextId, On the host side, how do you know which HTTP request header to return correctly?
"Context" is extremely overloaded in the current implementation, so let me be a little more explicit:
Filter A (plugin_id: 1
) and Filter B (plugin_id: 2
) are plugins that define code and configuration, and HTTP and TCP events aren't called on them, but on a specific HTTP or TCP context, e.g. for incoming HTTP request, you'll create context for that HTTP request for Filter A (context_id: 3
, plugin_id: 1
) and for Filter B (context_id: 4
, plugin_id: 2
).
Now, when HTTP headers are read on the host side, it's going to call proxy_on_http_request_headers(context_id=3, ...)
(notice it's the HTTP request's context ID, not plugin's context ID), and since the WasmVM is single-threaded, the host side knows that it's currently handling context_id=3
, so all host functions called from within WasmVM are assumed to be requested for the context_id=3
(unless it's changed using proxy_set_effective_context
), so when the WasmVM calls proxy_get_buffer(...)
, host knows to return buffer for the HTTP request matching context_id=3
.
Does it make sense?
If the contextId is passed to the host, the host will correctly identify the Filter instance and return the HTTP header.
Do you mean that we should explicitly include context_id
in the host calls? e.g. call proxy_get_buffer(context_id=3, ...)
instead of proxy_get_buffer(...)
and having host side track current context_id
?
I think that's a good idea, and I suggested it a while ago myself, but I didn't get buy-in from other people working on the project at the time. Maybe it's a good time to revisit this.
According to my understanding, the instance of each WASM module should be equivalent to the isolation sandbox, and sharing the same instance should save resources. If my understanding is incorrect, please correct me.
Multiple plugins are already supported in the same WasmVM, regardless of the context_id
being included or not.
I think the thing you were missing is that current host implementations (e.g. Envoy) automatically track current context_id
.
@mathetake thank you for your reply. It sounds, because envoy is a single-threaded shared WASM VM instance, when the plug-in calls the host, the host can obtain the WASM VM instance Context according to the current thread.
If it is go runtime (sidecar mosn), thread storage is not supported, this seems to be problematic.
The WasmVM that MOSN uses is still single-threaded, right? So there is always only one context_id
that's being executed.
@mathetake thank you for your reply. It sounds, because envoy is a single-threaded shared WASM VM instance, when the plug-in calls the host, the host can obtain the WASM VM instance Context according to the current thread. If it is go runtime (sidecar mosn), thread storage is not supported, this seems to be problematic.
The WasmVM that MOSN uses is still single-threaded, right? So there is always only one
context_id
that's being executed.
Currently, each request corresponds to a coroutine(not single-threaded). If the host needs to follow the abi specification and maintain current context_id
on the host side, we need to lock the VM Instance to ensure that the service processing is serial (golang does not support local thread storage).
Filter A(context ID:1) and Filter B(context ID:2)are XxxFilter extension instances, Filter A is handling OnHttpRequestHeader,Filter B is handling OnHttpResponseBody, Both Filter instances invoke 'proxy_get_buffer' method to get the HTTP header, because they didn't pass the contextId, On the host side, how do you know which HTTP request header to return correctly?
"Context" is extremely overloaded in the current implementation, so let me be a little more explicit:
Filter A (
plugin_id: 1
) and Filter B (plugin_id: 2
) are plugins that define code and configuration, and HTTP and TCP events aren't called on them, but on a specific HTTP or TCP context, e.g. for incoming HTTP request, you'll create context for that HTTP request for Filter A (context_id: 3
,plugin_id: 1
) and for Filter B (context_id: 4
,plugin_id: 2
).Now, when HTTP headers are read on the host side, it's going to call
proxy_on_http_request_headers(context_id=3, ...)
(notice it's the HTTP request's context ID, not plugin's context ID), and since the WasmVM is single-threaded, the host side knows that it's currently handlingcontext_id=3
, so all host functions called from within WasmVM are assumed to be requested for thecontext_id=3
(unless it's changed usingproxy_set_effective_context
), so when the WasmVM callsproxy_get_buffer(...)
, host knows to return buffer for the HTTP request matchingcontext_id=3
.Does it make sense?
If the contextId is passed to the host, the host will correctly identify the Filter instance and return the HTTP header.
Do you mean that we should explicitly include
context_id
in the host calls? e.g. callproxy_get_buffer(context_id=3, ...)
instead ofproxy_get_buffer(...)
and having host side track currentcontext_id
?I think that's a good idea, and I suggested it a while ago myself, but I didn't get buy-in from other people working on the project at the time. Maybe it's a good time to revisit this.
According to my understanding, the instance of each WASM module should be equivalent to the isolation sandbox, and sharing the same instance should save resources. If my understanding is incorrect, please correct me.
Multiple plugins are already supported in the same WasmVM, regardless of the
context_id
being included or not.I think the thing you were missing is that current host implementations (e.g. Envoy) automatically track current
context_id
.
Your description is very detailed, and my understanding is consistent with yours.
Do you mean that we should explicitly include context_id in the host calls? e.g. call proxy_get_buffer(context_id=3, ...) instead of proxy_get_buffer(...) and having host side track current context_id?
I think that's a good idea, and I suggested it a while ago myself, but I didn't get buy-in from other people working on the project at the time. Maybe it's a good time to revisit this.
If the calling host passes the contextId, I think it can provide great flexibility to the host implementation (non-envoy).
Currently, each request corresponds to a coroutine(not single-threaded). If the host needs to follow the abi specification and maintain
current context_id
on the host side, we need to lock the VM Instance to ensure that the service processing is serial (golang does not support local thread storage).
Right, each HTTP request correspond to a coroutine on the host side, but you only have a single WasmVM instance (that's effectively single-threaded) and you don't create WasmVM for each coroutine, right? If so, you should be able to track which context_id
is being executed within that WasmVM.
If the calling host passes the contextId, I think it can provide great flexibility to the host implementation (non-envoy).
Agreed. I'm redesigning the ABI right now, and I'll take this into consideration.
Currently, each request corresponds to a coroutine(not single-threaded). If the host needs to follow the abi specification and maintain
current context_id
on the host side, we need to lock the VM Instance to ensure that the service processing is serial (golang does not support local thread storage).Right, each HTTP request correspond to a coroutine on the host side, but you only have a single WasmVM instance (that's effectively single-threaded) and you don't create WasmVM for each coroutine, right? If so, you should be able to track which
context_id
is being executed within that WasmVM.
If so, you should be able to track which context_id is being executed within that WasmVM.
yes. In order to reduce concurrent locking, multiple WASM VM Instances will also be considered, at least for now, it’s not the best solution.
Agreed. I'm redesigning the ABI right now, and I'll take this into consideration.
For me, this is good news.
yes. In order to reduce concurrent locking, multiple WASM VM Instances will also be considered, at least for now, it’s not the best solution.
You'll end-up using too much memory with multiple WasmVM instances. We already have issues with consuming too much memory in Envoy when using a single WasmVm per loaded bytecode per CPU.