Description

Currently, envd-server will clone the repo from the info of image label. We should also support sync with user's local folder

Current logic at https://github.com/tensorchord/envd-server/blob/main/pkg/server/environment_create.go#L175-L201

Reference:

https://www.okteto.com/docs/reference/file-synchronization/
ksync
Syncthing

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

Dec 06 '22 06:12 VoVAllen

Other discussions: https://github.com/tensorchord/envd/issues/530

Dec 06 '22 06:12 VoVAllen

Implementation notes (Ignore)

Envd needs to:
- Install syncthing (bin) locally
- Communicate envd-server endpoint to set up and start the sync
Envd server needs to:
- Have endpoints to interact with syncthing
- Perform configurations (connect local directory to container directory)

TODO:

Determine which configurations needs to happen and where it's happening (setting up connections, match devices etc.)

Syncthing

Rest API Kubernetes

Ksync Implementation

ksync architecture

Oketeto Implementation

Interacts with syncthing via binary cli Interacts with syncthing via syncthing api

Questions

Will multiple users connect to the same container? (i.e. will it always be a 1-1 sync and not 1-many)
- This should be ok because syncthing supports multi-device sync, just need to figure out how to manage device IDs.

Nonessential Features

Progress bar
- Okteto implementation: repeatedly ping the syncthing server for progress and show in terminal ui.

Dec 21 '22 00:12 AlexXi19

Proof of Concept

For a proof of concept, I added a syncthing container to environment_create.go here, and manually configured the file sync connection between the source folder and the folder in the docker container using the GUI.

Demo

In the demo, I'm manually clicking the sync button, but we can adjust the syncthing sync interval.

https://user-images.githubusercontent.com/68758451/209451801-938b18ae-0e5e-49b3-bbcb-ce4d77e21e35.mov

Next steps

Functionality

Local setup

Install syncthing binary on user's computer
Run the syncthing binary on the user's computer

Implement the manual sync steps with code: (I don't know how to do this yet)

Syncthing uses xml to set up configurations (link)

Working with DeviceIDs (use syncthing REST API like (this)[https://docs.syncthing.net/rest/system-status-get.html]
Host sends "add device" request to container and container accepts request
Host creates sync folder and shares the sync with the container

Other features

Sync Logging
Error handling with sync (might be tricky)
Edge cases
- What happens with the sync for github repos? Which do we prioritize?
Configurations
- Syncthing sync interval
- public/private keys
- https certificates

Questions

When should I install syncthing? On boostrap? On run?
How to link the source directory with the target directory? I think when we send the request to the server from run, we can send the local path with the request to know which directory to sync.
About device discovery and connection
- Need to learn more about the syncthing discovery server to better decide what's the best choice for connecting devices. Syncthing also offers a global discovery service, we should be careful to NOT use that.
- Ksync uses arbitrary deviceIDs and uses the API to modify and update the configurations
- For device discovery, it opens a tunnel between the two devices and only allows discovery through the tunnel, it has a service that gets device IDs and connects them.
- Oketeto uses deterministic deviceIDs in configurations
  - For service discovery, it allows discovery through TBD i dont know yet, need to look more into it but there should be some kind of connection maybe through port forwarding the discovery service to allow devices to find each other, need to find how this is done.

Dec 24 '22 22:12 AlexXi19

@VoVAllen

Dec 24 '22 22:12 AlexXi19

When should I install syncthing? On boostrap? On run?

I think we can add this to envd attach now. When user attaches to the envs, it will do ssh + port-forwarding + file sync.

Local setup

We can setup a synching docker at user's side also as the client. It makes the binary delivery easier.

How to link the source directory with the target directory? I think when we send the request to the server from run, we can send the local path with the request to know which directory to sync.

You can assume the working directory as the source directory now. We may provide more detailed configuration later.

About device discovery and connection

We can generate a random ID directly. And use ssh port-forwarding to connect to the pod's syncthing ports. We don't need service discovery at all I think. Since the target and source is deterministic here

Dec 25 '22 13:12 VoVAllen

When should I install syncthing? On boostrap? On run?

If you mean the binary install, bootstrap will be better. Or we need to add complex logic in attach

Dec 26 '22 01:12 gaocegege

And I am not sure if we should use synching or https://github.com/rclone/rclone

XML config looks weird to me.

Dec 26 '22 01:12 gaocegege

And I am not sure if we should use synching or https://github.com/rclone/rclone

XML config looks weird to me.

You don't really have to work with the xml other than writing up the default. If you want to make changes to the configuration, you can also use the go struct (here)[https://pkg.go.dev/github.com/syncthing/[email protected]/lib/config#Configuration] so you don't have to work with the xml directly. Ksync and okteto both use syncthing but i can look into rclone a bit more.

Dec 26 '22 01:12 AlexXi19

Design Document

Envd-server file sync functionality

Description

When a pod/environment is provisioned after envd up --image <image-name>, the user's project directory and it's files are synced into the container in the development pod. When the files are modified either locally or within the container, the changes are synced to keep project files consistent.

Functional Requirements

Syncs file between project directory and remote container

Non-functional Requirements

TBD (Sync interval, latency requirements, security etc.)

Implementation

Syncing

The core sync functionality is implemented by using syncthing, which can sync files between two devices.

Syncthing on Local

The syncthing binary is downloaded based on the user's os and architecture, it is installed on envd bootstrap and executed on envd-up. Before the syncthing is executed, we write a config.xml file to the syncthing home directory so that syncthing can read in the configuration on startup. When the syncthing binary is executed, we configure the home (config) directory to be in .config/envd/syncthing as to not interfere with the user's own syncthing configurations. After the binary is executed, a local instance of syncthing starts running and we can start connecting it to the remote instance.

Syncthing on Kubernetes

For syncthing in the kubernetes pod, we use an image of syncthing here to start it up as a container. In terms of the starting configuration, we send in the config.xml file via kubernetes configmaps. However, there is a caveat that configmaps are read only but we can bypass this by mounting the configmap to a temporary directory and using a container lifecycle event on container start to copy the file into the correct directory, which for this syncthing image is /config.

Working with Syncthing

In order to make changes or get information from the syncthing application (to add devices, add folders, check on status, etc.), we use the syncthing rest api. To communicate with the syncthing instance on kubernetes, the appropriate port needs to be forwarded.

The two syncthing instances also need to be discoverable by each other. This can be done through ssh tunnel port forwarding. (I've only tested discovery with the kubernetes cluster and local syncthing instance on the same network/computer so I'm not sure if it'll be different if the kubernetes cluster is on another instance)

Waiting for Events

Since most interactions with syncthing are asynchronous, we need to wait for operations to complete before proceeding. As a few examples, when the binary is executed, you need to wait for the syncthing application to start up to start calling the rest api, when configurations are applied via the rest api, syncthing returns a response immediately and you have to wait for the changes to actually be applied. Some other example asynchronous operations are when files are being scanned or folders are being synced.

Therefore, for asynchronous operations that need to be awaited, there are Wait functions that queries the syncthing rest api on the status of the operation.

Design Choices

Syncthing configurations

For the other kubernetes file sync implementations that I referenced (okteto, ksync) both use xml files to initialize the configuration. However, I chose to not work with xml files to make code more readable, maintainable, consistent and to also keep the configurations closer to the application code. I chose to move the complexity from the build code to the application code (config logic with xml files via Dockerfile vs. with structs in go).

Connecting Two Devices

Syncthing's deviceIDs are generated deterministically from the priv/pub keys. For okteto's implementation, the deviceIDs and priv/pub keys are hard coded. However, for my implementation, I let syncthing autogenerate the deviceID and priv/pub keys and use the API to query the deviceID, and configure the file sync. Hopefully this will prove to be more flexible and extendable in the future.

Dec 29 '22 08:12 AlexXi19

Will there be a syncthing process in the local host?

Dec 29 '22 10:12 gaocegege

Will there be a syncthing process in the local host?

Yes. The process is started with cmd := exec.Command(GetSyncthingBinPath(), "-no-browser", "-no-restart", "-home", s.HomeDirectory) and the cmd object is kept in memory in the Syncthing struct so we can use it to manage the process later.

Dec 29 '22 18:12 AlexXi19

Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.

Dec 30 '22 01:12 gaocegege

For local development, I'm not sure if it's better to use mount instead of sync.

Some corner cases:

file size limitation (users may accidentally add a model file to the folder)
follow the soft link or not?

Dec 30 '22 02:12 kemingy

Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.

Sure, or also when the environment is destroyed. I'm almost done with functionalities but haven't actually put the code in the cli commands yet. We can discuss more details after the core sync functionalities are finished!

Dec 30 '22 05:12 AlexXi19

For local development, I'm not sure if it's better to use mount instead of sync.

Some corner cases:

file size limitation (users may accidentally add a model file to the folder)

follow the soft link or not?

There can be an option to configure ignore files/folder but the case when you accidentally drop a large file into the folder can definitely be problematic. Potential solutions could be halt if a large file is detected, or ignore large files. What are you suggesting with mount?

Dec 30 '22 05:12 AlexXi19

What are you suggesting with mount?

By default, we will mount the current working directory. I think it's not necessary to sync the files in this dir.

Dec 31 '22 02:12 kemingy

envd-server
envd-server copied to clipboard

feat: Support sync code with local folders