langchain
langchain copied to clipboard
Replace exec with wasm_exec
This replaces the usage of the insecure exec
function with a more secure library called wasm_exec TL;DR: Use chroot
jails, wasmtime
and a standalone Python3.11 WASM interpreter to safely execute arbitrary code
Some questions before I officially submit this pr:
-
How critical is it that we support the execution of code containing 3rd-party dependencies? The team behind the wasm interpreter I vendor with the library is working on porting
numpy
along with some of the c libs needed to enable broader package support.-
If it is critical, then how critical is it to support Python versions below 3.11? Right now the interpreter does support the use of an existing venv for accessing installed 3rd party deps, but the interpreter is only distributed in 3.11 which means I would either a) have to only support 3.11 b) have a 3rd split in the PythonRepl code base (base, AST, Wasm) or b) try to write some hacky and unpythonic code to try to dynamically create a new 3.11-based venv off of the currently active Python environment.
-
If it is not critical then is it okay if I refactor the test cases to avoid the tests containing
numpy
andpandas
code? Should we also update the docs to reflect that limitation?
-
I also strongly welcome feedback on the wasm_exec library if there are implementation ideas that make it more functional.
Fixes #1026 Fixes #5294 Fixes #5388 Fixes https://nvd.nist.gov/vuln/detail/CVE-2023-29374
Who can review?
- @hwchase17
- @vowelparrot
Just my opinions here. I've not commit bit & am relatively new to langchain.
This replaces the usage of the insecure
exec
function with a more secure library called wasm_exec TL;DR: Usechroot
jails,wasmtime
and a standalone Python3.11 WASM interpreter to safely execute arbitrary code
100% needed, and thanks!
Some questions before I officially submit this pr:
- How critical is it that we support the execution of code containing 3rd-party dependencies? The team behind the wasm interpreter I vendor with the library is working on porting
numpy
along with some of the c libs needed to enable broader package support.
It's better to get this in place first, and then worry about 3rd party deps later. The security concern massively outweighs feature depth.
- If it is critical, then how critical is it to support Python versions below 3.11? Right now the interpreter does support the use of an existing venv for accessing installed 3rd party deps, but the interpreter is only distributed in 3.11 which means I would either a) have to only support 3.11 b) have a 3rd split in the PythonRepl code base (base, AST, Wasm) or b) try to write some hacky and unpythonic code to try to dynamically create a new 3.11-based venv off of the currently active Python environment.
I also think it's not worth a lot of effort to support power Python versions. (b) & (c) feel like maintenance burdens, and simplicity is important in security-minded components.
@hwchase17 @vowelparrot would love some feedback on this PR. I recognize some of the tests are still failing, but I think that resolution is dependent upon answers to the questions I posed above. Thanks!
@hwchase17 @vowelparrot bumping again, hoping to get your thoughts...
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Ignored Deployment
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | ⬜️ Ignored (Inspect) | Visit Preview | Nov 1, 2023 11:18pm |
@hwchase17 can you review the responses to the comments and let me know how you want to proceed?
any update on this feature/PR? Thanks.
sorry for the delay here. high level thoughts:
- We almost certainly don't want to introduce this as a dependency for the main project
- We want to support as flexible of python running for prototyping purposes as possible
- We do not want people to be running this in production
I would suggest the following changes:
- We will move the current python tools and anything that depends on them into
langchain.experimental
and make that a separate package. This should remove CVEs from core langchain, while also hopefullying making it clear its experimental, and also allowing for prototyping - I would encourage the creation of a separate
langchain_python
or the like respository where that uses wasm by default. we'd be happy to support this in a variety of ways (help setting up, help publicizing, help maintaining), but i think we'd prefer it to be a separate repository
thoughts?
Responded more in #8092
As far as this goes, I'll probably take these changes and implement as a separate package. How would that jive with the story of langchain.experimental
though? Seems confusing to me to have an unsafe and safe way of doing things, both as external packages and possibly publicized by LangChain.
@Jflick58 Hi , could you, please, resolve the merging issues? After that ping me and I push this PR for the review. Thanks!
@leo-gan I thought @hwchase17 decided to move the repl functionality to langchain-experimental to avoid including this security issue in the main langchain install? Happy to work on this PR to include the wasm-exec
functionality in the main package, but wanted to confirm. Thanks!
@Jflick58 If this change suits experimental better, is it possible to start this change again as a new PR in experimental? If you want to do this, please, close this PR.
Would it have to be a new PR? Given that langchain
and langchain-experimental
are both from this repository, could this PR be modified accordingly?
It is up to you. This PR could be OK :+1:
Hey guys, any updates for this PR?
@RuanAzevedo This is still on my to-do list. I'm traveling for work this week but hoping to pick it back up next week.
hi @Jflick58, any update on this PR (to resolve this CVE)?
Is there any update on this?