envd
envd copied to clipboard
feasibility-research(lang): Refactor frontend language
Description
priority/high
Maybe we can support on-the-fly build like this
import midi
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
midi.pip_package(name=["tensorflow", "numpy"])
"""
## Prepare the data
"""
# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)
There are two papers/projects that @VoVAllen recommends reading:
- https://github.com/dyweb/papers-notebook/issues/290
- https://github.com/dyweb/papers-notebook/issues/289
Autodetect:
- https://github.com/replit/upm
- https://buildpacks.io/
https://github.com/maxmcd/bramble Purely functional build system and package manager based on Nix
We can refer to this project to implement some minimal primitives in Golang, then write starlark logic to provide built-in funcs.
Now we write too many go funcs. And users need to write golang to extend the envd language.
// registerenvdRules registers built-in envd rules into the global namespace.
func registerenvdRules() {
starlark.Universe[ruleBase] = starlark.NewBuiltin(ruleBase, ruleFuncBase)
starlark.Universe[rulePyPIPackage] = starlark.NewBuiltin(
rulePyPIPackage, ruleFuncPyPIPackage)
starlark.Universe[ruleSystemPackage] = starlark.NewBuiltin(
ruleSystemPackage, ruleFuncSystemPackage)
starlark.Universe[ruleCUDA] = starlark.NewBuiltin(ruleCUDA, ruleFuncCUDA)
starlark.Universe[ruleVSCode] = starlark.NewBuiltin(ruleVSCode, ruleFuncVSCode)
starlark.Universe[ruleUbuntuAPT] = starlark.NewBuiltin(ruleUbuntuAPT, ruleFuncUbuntuAPT)
starlark.Universe[rulePyPIMirror] = starlark.NewBuiltin(rulePyPIMirror, ruleFuncPyPIMirror)
starlark.Universe[ruleShell] = starlark.NewBuiltin(ruleShell, ruleFuncShell)
starlark.Universe[ruleJupyter] = starlark.NewBuiltin(ruleJupyter, ruleFuncJupyter)
starlark.Universe[ruleRun] = starlark.NewBuiltin(ruleRun, ruleFuncRun)
}
Runtime Lang Design
Explicit call runtime running program
def jupyter(port):
# Will add this as subprocess to PID 1
cmd("jupyter --port={}".format(port)).redirect(stdout=file("stdout.log"), stderr=endpoint("http://api.tensorchord.ai/record_err"))
def launch_ssh():
# launch ssh
cmd("/var/midi-ssh")
def run_train():
# Means the PID 1 will monitor this process, exit when this process failed
cmd("python train.py").main_program()
def run():
jupyter(port=8888)
launch_ssh()
run_train()
Then envd run will run the run function.
We may provide some primitives to play with buildkit in Go, then use these primitives to write logic in starlark. Like:
def base(os, language):
_base_img= Image("ubuntu:20.04")
# Execute command over base
install_htop = _base_img.run("sudo apt install htop")
# as llb merge, create a new state as base the base image
_base.image = _base.merge([install_htop, ...])
# Later step will use _base.image as the base
LLB Primitives we used in current ir package:
llb.Mkfilellb.Userllb.Mkdirllb.WithCustomNamellb.Shlexllb.Copyllb.Localllb.Mergellb.WithUIDGIDllb.Diffllb.Scratchllb.Image
LLB state operations:
state.Runstate.AddMountstate.File
Some random thoughts about frontend language
def build():
base(os="ubuntu20.04", language="julia")
config.julia_pkg_server(url="https://mirrors.tuna.tsinghua.edu.cn/julia")
install.julia_packages([
"Example"
])
service.jupyter()
service.ssh()
service.new(name="tensorboard", command="tensorboard --logdir=logs")
system.copy(src="example.ipynb", dst="notebooks/example.ipynb")
vcs.git(remote="https://github.com/tensorchord/envd", branch="master", path="envd")
data.dvc(remote="https://remote.dvc.org", path="data")
data.s3(bucket="tensorchord-data", path="data")
The dev config (ssh, cwd mount) can be wrapped into preset, to make it easy to support non-tensorchord base images:
def build():
base(os="ubuntu20.04", language="julia")
preset()
def serving():
base(os="ubuntu20.04", language="python", image="python:3.8")
...
def preset():
service.jupyter()
service.ssh()
system.mount(cwd, "/home/envd")
The design of state in bulidkit looks suitable for method chaining.
Pros
- commands run in order
- users can control which parts are run in parallel
Cons
- need to refactor a lot, now we build a static graph instead of a DAG
The interface will look like:
Image("ubuntu20.04").apt_install("make").with_python("3.9").with_cuda()
Scratch().git_repo("xxx").copy_to(img, src="/opt", dest="/")