envd feat: Add Kubernetes runtime proposal

Ref #179

Proposal preview: https://github.com/gaocegege/envd/blob/proposal/docs/proposals/20220603-kubernetes-vendor.md

Jun 13 '22 12:06 gaocegege

/cc @Xuanwo @hezhizhen @knight42

Jun 29 '22 08:06 gaocegege

Other options for file syncing:

reverse sshfs https://github.com/lima-vm/sshocker
ksync https://github.com/ksync/ksync

Jun 29 '22 12:06 VoVAllen

One thing I'm concerning is that whether syncing is needed at MVP. Code should be on a PV-like thing I think and usually people works with git

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

Jun 29 '22 12:06 knight42

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

Jun 29 '22 13:06 gaocegege

As discussed with some infra engineers interested in envd, port-forwarding may consume many API server CPUs. And tools like virtual kubelet does not support port forwarding.

Jun 29 '22 14:06 gaocegege

port-forwarding may consume many API server CPUs.

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

virtual kubelet does not support port forwarding

Indeed. But what are we going to do to access the services inside the container without port-forwarding?

Jun 29 '22 14:06 knight42

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

Or, we can forget the sync things. We request users to develop on the remote container, instead of the host. The build.envd may look like this:

def build():
    base(os="ubuntu20.04", language="python3")
    install.vscode_extensions([
        "ms-python.python",
    ])
    #config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
    install.python_packages([
        "tensorflow",
        "numpy",
    ])
    shell("zsh")
    config.jupyter(password="", port=8888)
+    config.working_dir(local=".", remote="https://github.com/tensorchord/envd.git")

envd mounts the local dir with docker runner and downloads the repo with Kubernetes runner.

Jun 29 '22 14:06 gaocegege

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

There should not be huge traffic by design. But algorithm engineers may use it to copy data:

scp <10G-file> container:~

Jun 29 '22 14:06 gaocegege

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

Jun 29 '22 14:06 gaocegege

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

Device plugin is a not-so-good comparison. Let's say kubectl plugin mechanism. The shim maintained by users can be integrated into the envd.

Jun 29 '22 14:06 gaocegege

config.working_dir(local=".", remote="https://github.com/tensorchord/envd.git")

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

algorithm engineers may use it to copy data:

Got it 👌 If we need to transfer such huge file via port-forwarding without rate limiting, the functionality of apiserver might be affected.

while users can write their own shim to customize, e.g. using service and ingress.

I think it make sense 👍

Jun 29 '22 16:06 knight42

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

Sounds reasonable. It should be a runtime argument instead of build time.

@VoVAllen Do you have opinion on it?

Jun 30 '22 01:06 gaocegege

Agree it better to be a runtime option. If using git, user should handle the sync related thing by himself (clone git repo and git pull/push). Therefore config.working_dir might not be needed.

Jun 30 '22 04:06 VoVAllen

Things to be decided:

[ ] How to support code repository, sync or git
[ ] How to support exposing services to end users, port-forward, nodePort or ingress/service
[ ] How to allow users to customize the logic without maintaining a fork of envd (plugin system)

Jul 05 '22 12:07 gaocegege

Found one new syncing tool: https://github.com/mutagen-io/mutagen

Jul 06 '22 12:07 VoVAllen

How is it going now?

Jul 11 '22 16:07 aseaday

I am still working on #261 . Currently no bandwidth for it.

Jul 12 '22 00:07 gaocegege

I do some exp on K8S syync. Maybe I could help you or discuss with more details If you want.

Jul 12 '22 02:07 aseaday

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gaocegege]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Sep 15 '22 00:09 muniu-bot[bot]

I will update the Kubernetes design proposal recently. It should be elegant, and fancy.

Sep 16 '22 06:09 gaocegege

We request users to develop on the remote container, instead of the host.

We should not assume where the user to work on

in my practice, working on remote jupyter notebook or working on local VS code, which connects to remote jupyter kernel, or working on local VS code with VS code remote development kit is the usual scenario.

on the other hand, there are a lot of algorithm engineers who do not use git as their source code management tool, they just write the code and produce some summary stuff, then the work is over, the source code does not need to manage.

file sync by envd can be a choice for the user, but can not be the only one.

how about letting the user choose how to set up their work style?

config.jupyter(password="", port=8888)
# if they want to work on a jupyter notebook
config.vscodeserver()
# or if they want to work on a VS code
config.sync(local='.', runtime='/workdir')
# or we sync their source code and datasets
# also the underhood can be a choice from volume(if the runtime is a local container)
# port-forward, nodePort, ingress, etc. if the runtime is a kubernetes

# or maybe 
envd up --sync
# but `--sync` should be the default behavior
# so, 
envd up -d --no-sync
# can be more practical
envd context ls
# then just print the context information
# if the runtime is a remote one
# tells users that VS code remote development kit can help with their work

Sep 17 '22 18:09 TaylorHere

The proposal is updated with significant changes, PTAL.

Sep 21 '22 03:09 gaocegege

PTAL

Sep 30 '22 00:09 gaocegege

@Xuanwo Thanks for your fix!

Sep 30 '22 02:09 gaocegege

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.

Sep 30 '22 02:09 kemingy

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.

It should be fixed. PTAL.

Sep 30 '22 05:09 gaocegege

I am merging it since we are already starting the development of envd-server.

Oct 11 '22 04:10 gaocegege

envd envd copied to clipboard

feat: Add Kubernetes runtime proposal

envd
envd copied to clipboard