envd
envd copied to clipboard
proposal(ir): state based implementation
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: kemingy
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [kemingy]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
How to merge different different stages like llb.Merge?
Also I think bramble's syntax can be an option
How to merge different different stages like llb.Merge?
Will add Merge, Diff, File later.
Also I think bramble's syntax can be an option
Will take a look.
Some examples from bramble https://github.com/maxmcd/bramble/blob/eea4aee51e6ad881166412d61190012fb0d97c56/internal/project/testdata/project/default.bramble
I think we should have an idea about what the llb graph will look like after more dependency is set by the user (such as gcc and pypi packages), that can also utilize caches as much as possible.
Simple things should be simple, complex things should be possible.
I think bramble's example there looks complex.
https://github.com/maxmcd/bramble/blob/eea4aee51e6ad881166412d61190012fb0d97c56/internal/project/testdata/project/default.bramble
It declares the input arguments explicitly. Personally, I prefer the func chain.
I think we should have an idea about what the llb graph will look like after more dependency is set by the user (such as gcc and pypi packages), that can also utilize caches as much as possible.
Simple things should be simple, complex things should be possible.
Agree. What we have now should be simple. Others like parallelism should be possible.
Method chaining should be enough for sequence commands. Each function should return a state. (ExecState should be hidden by introducing more parameters)
Some questions:
- Is it possible to auto-merge the two chains? Merge/diff should be advanced statements.
- How to integrate it with the envdlib?
* Is it possible to auto-merge the two chains? Merge/diff should be advanced statements.
I think it's hard. (correct me if I'm wrong)
Starlark doesn't support operator overloading like conda_state + vscode_state. But I think we can introduce a new method like conda_state.merge([vscode_state]) if it's helpful.
Besides, we need explicitly use root.state() to get a copy if another branch is not built from scratch. Otherwise, we don't know when to diverge.
* How to integrate it with the envdlib?
Some ideas:
envdlibcan provide functions built from scratch or from a user-provided state.- from scratch:
root.merge([envdlib.compile_rust_serving()]) - from a state:
root = envdlib.tensorboard(conda_state)orroot.apply(envdlib.tensorboard, host_port=9000)so users can continue chaining
- from scratch:
- We should provide
Source.envd_python()which is equivalent tobase(os="ubuntu20.04", language="python3").
Then could the new language syntax be compatible with the existing design?
Or is it a total breaking change?
BTW, could you please provide the example for python-basic with the new design?
Then could the new language syntax be compatible with the existing design?
Or is it a total breaking change?
I think it will be a breaking change.
BTW, could you please provide the example for
python-basicwith the new design?
Already been added to the proposal. PTAL.
@VoVAllen WDYT
I have no opinion on it, let's start researching if starlark supports it.
I'm a bit concerned about the current proposal. The current design is detail-oriented, which is more complicated than original design. Also current design looks similar to llb, we can also consider expose llb-like primitives directly. Explicit dependency declaration is an advanced function, llb primitives would be easier for us to maintain and ensures that "complex thing is possible"
Some personal thoughts:
Explicitly define two/three stages. base stages, envd-managed stages(install.python_packages etc.), user-managed stages(run(XXX))
The difference between them is:
- base stages can be overwritten by custom images, and managed by envd if not specified
- envd-managed stages will parallelize and use cache as much as possible to accelerate the process, thus no dependency can be set here.
- user-managed stages can be fully customized, with explicit dependency.
Other ideas:
All functions provided by envd can add a new argument, such as called state.
In user stages user can do:
state = stage('user')
state1 = install.apt_packages(["g++"], state=state)
new_state = install.python_packages(["package_needs_g++"], state=state1)
and to define it as a custom function:
def install_inhouse_package():
state = stage('user')
state1 = install.apt_packages(["g++"], state=state)
new_state = install.python_packages(["package_needs_g++"], state=state1)
# envd_output is an builtin variable, add means merge state with the final output
envd_output.add(new_state)
return new_state
To use
def build():
install.python_packages(['torch'])
install_inhouse_packages()
WDYT
I'm a bit concerned about the current proposal. The current design is detail-oriented, which is more complicated than original design. Also current design looks similar to llb, we can also consider expose llb-like primitives directly. Explicit dependency declaration is an advanced function, llb primitives would be easier for us to maintain and ensures that "complex thing is possible"
Some personal thoughts: Explicitly define two/three stages. base stages, envd-managed stages(
install.python_packagesetc.), user-managed stages(run(XXX))The difference between them is:
* base stages can be overwritten by custom images, and managed by envd if not specified * envd-managed stages will parallelize and use cache as much as possible to accelerate the process, thus no dependency can be set here. * user-managed stages can be fully customized, with explicit dependency.
This is similar to the current implementation and this proposal. We do have different stages, it's just not explicit.
We can provide the install.conda_python() function. So users who start with the custom images can use it to install the python environment.
Other ideas:
All functions provided by envd can add a new argument, such as called
state.In user stages user can do:
state = stage('user') state1 = install.apt_packages(["g++"], state=state) new_state = install.python_packages(["package_needs_g++"], state=state1)and to define it as a custom function:
def install_inhouse_package(): state = stage('user') state1 = install.apt_packages(["g++"], state=state) new_state = install.python_packages(["package_needs_g++"], state=state1) # envd_output is an builtin variable, add means merge state with the final output envd_output.add(new_state) return new_stateTo use
def build(): install.python_packages(['torch']) install_inhouse_packages()WDYT
Defining the dependencies with an extra state argument is acceptable but not very user-friendly.
The LLB-like syntax is only complex when you need to use diff and merge. Otherwise, the method chaining should be a simple solution.
One more thing, this is incompatible with config.envd.