envd icon indicating copy to clipboard operation
envd copied to clipboard

feat: Add Kubernetes runtime proposal

Open gaocegege opened this issue 2 years ago • 22 comments

Ref #179

Proposal preview: https://github.com/gaocegege/envd/blob/proposal/docs/proposals/20220603-kubernetes-vendor.md

gaocegege avatar Jun 13 '22 12:06 gaocegege

/cc @Xuanwo @hezhizhen @knight42

gaocegege avatar Jun 29 '22 08:06 gaocegege

Other options for file syncing:

  • reverse sshfs https://github.com/lima-vm/sshocker
  • ksync https://github.com/ksync/ksync

VoVAllen avatar Jun 29 '22 12:06 VoVAllen

One thing I'm concerning is that whether syncing is needed at MVP. Code should be on a PV-like thing I think and usually people works with git

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

knight42 avatar Jun 29 '22 12:06 knight42

My 2 cents is syncing might be needed ultimately, but might in some different ways. Let's say we are working with git, and push some new commits to a branch, I think it would be better if envd could pull the new commits automatically to simulate the local development experience.

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

gaocegege avatar Jun 29 '22 13:06 gaocegege

As discussed with some infra engineers interested in envd, port-forwarding may consume many API server CPUs. And tools like virtual kubelet does not support port forwarding.

gaocegege avatar Jun 29 '22 14:06 gaocegege

port-forwarding may consume many API server CPUs.

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

virtual kubelet does not support port forwarding

Indeed. But what are we going to do to access the services inside the container without port-forwarding?

knight42 avatar Jun 29 '22 14:06 knight42

Thanks for the advice! Automatic push/pull looks magic to me. And it is complex. If the container is crashed, we also may lost the commits if we do not run push.

Or, we can forget the sync things. We request users to develop on the remote container, instead of the host. The build.envd may look like this:

def build():
    base(os="ubuntu20.04", language="python3")
    install.vscode_extensions([
        "ms-python.python",
    ])
    #config.pip_index(url = "https://pypi.tuna.tsinghua.edu.cn/simple")
    install.python_packages([
        "tensorflow",
        "numpy",
    ])
    shell("zsh")
    config.jupyter(password="", port=8888)
+    config.working_dir(local=".", remote="https://github.com/tensorchord/envd.git")

envd mounts the local dir with docker runner and downloads the repo with Kubernetes runner.

gaocegege avatar Jun 29 '22 14:06 gaocegege

Would you mind elaborating? AFAIK if there is no much traffic, port-forwarding should not consume too much cpu resources, as it is simply a SPDY connection under the hood.

There should not be huge traffic by design. But algorithm engineers may use it to copy data:

scp <10G-file> container:~

gaocegege avatar Jun 29 '22 14:06 gaocegege

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

gaocegege avatar Jun 29 '22 14:06 gaocegege

But what are we going to do to access the services inside the container without port-forwarding?

They may use service and ingress to achieve this. Thus we may need a mechanism to support customization here.

Maybe just like the design of the device plugin, we provide an interface to communicate between envd and a CLI shim. The shim does the critical logic like port forwarding. The envd just communicate with the shim and show information to users.

Port forwarding can be used in our default shim, while users can write their own shim to customize, e.g. using service and ingress.

Device plugin is a not-so-good comparison. Let's say kubectl plugin mechanism. The shim maintained by users can be integrated into the envd.

gaocegege avatar Jun 29 '22 14:06 gaocegege

  • config.working_dir(local=".", remote="https://github.com/tensorchord/envd.git")

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

algorithm engineers may use it to copy data:

Got it 👌 If we need to transfer such huge file via port-forwarding without rate limiting, the functionality of apiserver might be affected.

while users can write their own shim to customize, e.g. using service and ingress.

I think it make sense 👍

knight42 avatar Jun 29 '22 16:06 knight42

My concern is that working dir seems to be a command line argument to me, otherwise it might prevent the reuse of build.envd in different working dir, just like we don't specify the build context in Dockerfile. Besides if we need to specify the repo address, should we need to specify the branch as well?

Sounds reasonable. It should be a runtime argument instead of build time.

@VoVAllen Do you have opinion on it?

gaocegege avatar Jun 30 '22 01:06 gaocegege

Agree it better to be a runtime option. If using git, user should handle the sync related thing by himself (clone git repo and git pull/push). Therefore config.working_dir might not be needed.

VoVAllen avatar Jun 30 '22 04:06 VoVAllen

Things to be decided:

  • [ ] How to support code repository, sync or git
  • [ ] How to support exposing services to end users, port-forward, nodePort or ingress/service
  • [ ] How to allow users to customize the logic without maintaining a fork of envd (plugin system)

gaocegege avatar Jul 05 '22 12:07 gaocegege

Found one new syncing tool: https://github.com/mutagen-io/mutagen

VoVAllen avatar Jul 06 '22 12:07 VoVAllen

How is it going now?

aseaday avatar Jul 11 '22 16:07 aseaday

I am still working on #261 . Currently no bandwidth for it.

gaocegege avatar Jul 12 '22 00:07 gaocegege

I do some exp on K8S syync. Maybe I could help you or discuss with more details If you want.

aseaday avatar Jul 12 '22 02:07 aseaday

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaocegege

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

muniu-bot[bot] avatar Sep 15 '22 00:09 muniu-bot[bot]

I will update the Kubernetes design proposal recently. It should be elegant, and fancy.

gaocegege avatar Sep 16 '22 06:09 gaocegege

We request users to develop on the remote container, instead of the host.

We should not assume where the user to work on

in my practice, working on remote jupyter notebook or working on local VS code, which connects to remote jupyter kernel, or working on local VS code with VS code remote development kit is the usual scenario.

on the other hand, there are a lot of algorithm engineers who do not use git as their source code management tool, they just write the code and produce some summary stuff, then the work is over, the source code does not need to manage.

file sync by envd can be a choice for the user, but can not be the only one.

how about letting the user choose how to set up their work style?

config.jupyter(password="", port=8888)
# if they want to work on a jupyter notebook
config.vscodeserver()
# or if they want to work on a VS code
config.sync(local='.', runtime='/workdir')
# or we sync their source code and datasets
# also the underhood can be a choice from volume(if the runtime is a local container)
# port-forward, nodePort, ingress, etc. if the runtime is a kubernetes
# or maybe 
envd up --sync
# but `--sync` should be the default behavior
# so, 
envd up -d --no-sync
# can be more practical
envd context ls
# then just print the context information
# if the runtime is a remote one
# tells users that VS code remote development kit can help with their work

TaylorHere avatar Sep 17 '22 18:09 TaylorHere

The proposal is updated with significant changes, PTAL.

gaocegege avatar Sep 21 '22 03:09 gaocegege

PTAL

gaocegege avatar Sep 30 '22 00:09 gaocegege

@Xuanwo Thanks for your fix!

gaocegege avatar Sep 30 '22 02:09 gaocegege

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.

kemingy avatar Sep 30 '22 02:09 kemingy

@Xuanwo Thanks for your fix!

Probably you should try Grammarly. There are still some syntax errors.

It should be fixed. PTAL.

gaocegege avatar Sep 30 '22 05:09 gaocegege

I am merging it since we are already starting the development of envd-server.

gaocegege avatar Oct 11 '22 04:10 gaocegege