Description

priority/high

Apr 29 '22 07:04 gaocegege

Maybe we can support on-the-fly build like this

import midi
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
midi.pip_package(name=["tensorflow", "numpy"])

"""
## Prepare the data
"""

# Model / data parameters
num_classes = 10
input_shape = (28, 28, 1)

May 11 '22 01:05 gaocegege

There are two papers/projects that @VoVAllen recommends reading:

https://github.com/dyweb/papers-notebook/issues/290
https://github.com/dyweb/papers-notebook/issues/289

May 11 '22 02:05 gaocegege

Autodetect:

https://github.com/replit/upm
https://buildpacks.io/

May 11 '22 07:05 gaocegege

https://github.com/maxmcd/bramble Purely functional build system and package manager based on Nix

We can refer to this project to implement some minimal primitives in Golang, then write starlark logic to provide built-in funcs.

Now we write too many go funcs. And users need to write golang to extend the envd language.

// registerenvdRules registers built-in envd rules into the global namespace.
func registerenvdRules() {
	starlark.Universe[ruleBase] = starlark.NewBuiltin(ruleBase, ruleFuncBase)
	starlark.Universe[rulePyPIPackage] = starlark.NewBuiltin(
		rulePyPIPackage, ruleFuncPyPIPackage)
	starlark.Universe[ruleSystemPackage] = starlark.NewBuiltin(
		ruleSystemPackage, ruleFuncSystemPackage)
	starlark.Universe[ruleCUDA] = starlark.NewBuiltin(ruleCUDA, ruleFuncCUDA)
	starlark.Universe[ruleVSCode] = starlark.NewBuiltin(ruleVSCode, ruleFuncVSCode)
	starlark.Universe[ruleUbuntuAPT] = starlark.NewBuiltin(ruleUbuntuAPT, ruleFuncUbuntuAPT)
	starlark.Universe[rulePyPIMirror] = starlark.NewBuiltin(rulePyPIMirror, ruleFuncPyPIMirror)
	starlark.Universe[ruleShell] = starlark.NewBuiltin(ruleShell, ruleFuncShell)
	starlark.Universe[ruleJupyter] = starlark.NewBuiltin(ruleJupyter, ruleFuncJupyter)
	starlark.Universe[ruleRun] = starlark.NewBuiltin(ruleRun, ruleFuncRun)
}

May 11 '22 09:05 gaocegege

Runtime Lang Design

Explicit call runtime running program

def jupyter(port):
    # Will add this as subprocess to PID 1
    cmd("jupyter --port={}".format(port)).redirect(stdout=file("stdout.log"), stderr=endpoint("http://api.tensorchord.ai/record_err"))
    
def launch_ssh():
    # launch ssh
    cmd("/var/midi-ssh")

def run_train():
    # Means the PID 1 will monitor this process, exit when this process failed
    cmd("python train.py").main_program()

def run():
    jupyter(port=8888)
    launch_ssh()
    run_train()

Then envd run will run the run function.

May 16 '22 09:05 VoVAllen

We may provide some primitives to play with buildkit in Go, then use these primitives to write logic in starlark. Like:

def base(os, language):
    _base_img= Image("ubuntu:20.04")
    # Execute command over base
    install_htop = _base_img.run("sudo apt install htop")
    
    # as llb merge, create a new state as base the base image
    _base.image = _base.merge([install_htop, ...])

    # Later step will use _base.image as the base

Jul 12 '22 03:07 gaocegege

LLB Primitives we used in current ir package:

llb.Mkfile
llb.User
llb.Mkdir
llb.WithCustomName
llb.Shlex
llb.Copy
llb.Local
llb.Merge
llb.WithUIDGID
llb.Diff
llb.Scratch
llb.Image

LLB state operations:

state.Run
state.AddMount
state.File

Jul 12 '22 04:07 gaocegege

Some random thoughts about frontend language

def build():
    base(os="ubuntu20.04", language="julia")
    
    config.julia_pkg_server(url="https://mirrors.tuna.tsinghua.edu.cn/julia")
    
    install.julia_packages([
        "Example"
    ])

    service.jupyter()
    service.ssh()
    service.new(name="tensorboard", command="tensorboard --logdir=logs")

    system.copy(src="example.ipynb", dst="notebooks/example.ipynb")
    
    vcs.git(remote="https://github.com/tensorchord/envd", branch="master", path="envd")

    data.dvc(remote="https://remote.dvc.org", path="data")
    data.s3(bucket="tensorchord-data", path="data")

The dev config (ssh, cwd mount) can be wrapped into preset, to make it easy to support non-tensorchord base images:

def build():
    base(os="ubuntu20.04", language="julia")
    preset()

def serving():
    base(os="ubuntu20.04", language="python", image="python:3.8")
    ...

def preset():
    service.jupyter()
    service.ssh()
    system.mount(cwd, "/home/envd")

Jul 20 '22 14:07 gaocegege

The design of state in bulidkit looks suitable for method chaining.

Pros

commands run in order
users can control which parts are run in parallel

Cons

need to refactor a lot, now we build a static graph instead of a DAG

Sep 07 '22 13:09 kemingy

The interface will look like:

Image("ubuntu20.04").apt_install("make").with_python("3.9").with_cuda()
Scratch().git_repo("xxx").copy_to(img, src="/opt", dest="/")

Sep 08 '22 01:09 gaocegege

envd
envd copied to clipboard

feasibility-research(lang): Refactor frontend language

Description

Runtime Lang Design

envd envd copied to clipboard

feasibility-research(lang): Refactor frontend language

Description

Runtime Lang Design

envd
envd copied to clipboard