envd-server
envd-server copied to clipboard
feat: Support sync code with local folders
Description
Currently, envd-server will clone the repo from the info of image label. We should also support sync with user's local folder
Current logic at https://github.com/tensorchord/envd-server/blob/main/pkg/server/environment_create.go#L175-L201
Reference:
- https://www.okteto.com/docs/reference/file-synchronization/
- ksync
- Syncthing
Message from the maintainers:
Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.
Other discussions: https://github.com/tensorchord/envd/issues/530
Implementation notes (Ignore)
- Envd needs to:
- Install syncthing (bin) locally
- Communicate envd-server endpoint to set up and start the sync
- Envd server needs to:
- Have endpoints to interact with syncthing
- Perform configurations (connect local directory to container directory)
TODO:
- Determine which configurations needs to happen and where it's happening (setting up connections, match devices etc.)
Syncthing
Ksync Implementation
Oketeto Implementation
Interacts with syncthing via binary cli Interacts with syncthing via syncthing api
Questions
- Will multiple users connect to the same container? (i.e. will it always be a 1-1 sync and not 1-many)
- This should be ok because syncthing supports multi-device sync, just need to figure out how to manage device IDs.
Nonessential Features
- Progress bar
- Okteto implementation: repeatedly ping the syncthing server for progress and show in terminal ui.
Proof of Concept
For a proof of concept, I added a syncthing container to environment_create.go here, and manually configured the file sync connection between the source folder and the folder in the docker container using the GUI.
Demo
In the demo, I'm manually clicking the sync button, but we can adjust the syncthing sync interval.
https://user-images.githubusercontent.com/68758451/209451801-938b18ae-0e5e-49b3-bbcb-ce4d77e21e35.mov
Next steps
Functionality
Local setup
- Install syncthing binary on user's computer
- Run the syncthing binary on the user's computer
Implement the manual sync steps with code: (I don't know how to do this yet)
Syncthing uses xml to set up configurations (link)
- Working with DeviceIDs (use syncthing REST API like (this)[https://docs.syncthing.net/rest/system-status-get.html]
- Host sends "add device" request to container and container accepts request
- Host creates sync folder and shares the sync with the container
Other features
- Sync Logging
- Error handling with sync (might be tricky)
- Edge cases
- What happens with the sync for github repos? Which do we prioritize?
- Configurations
- Syncthing sync interval
- public/private keys
- https certificates
Questions
- When should I install syncthing? On
boostrap? Onrun? - How to link the source directory with the target directory? I think when we send the request to the server from
run, we can send the local path with the request to know which directory to sync. - About device discovery and connection
- Need to learn more about the syncthing discovery server to better decide what's the best choice for connecting devices. Syncthing also offers a global discovery service, we should be careful to NOT use that.
- Ksync uses arbitrary deviceIDs and uses the API to modify and update the configurations
- For device discovery, it opens a tunnel between the two devices and only allows discovery through the tunnel, it has a service that gets device IDs and connects them.
- Oketeto uses deterministic deviceIDs in configurations
- For service discovery, it allows discovery through TBD i dont know yet, need to look more into it but there should be some kind of connection maybe through port forwarding the discovery service to allow devices to find each other, need to find how this is done.
@VoVAllen
When should I install syncthing? On boostrap? On run?
I think we can add this to envd attach now. When user attaches to the envs, it will do ssh + port-forwarding + file sync.
Local setup
We can setup a synching docker at user's side also as the client. It makes the binary delivery easier.
How to link the source directory with the target directory? I think when we send the request to the server from run, we can send the local path with the request to know which directory to sync.
You can assume the working directory as the source directory now. We may provide more detailed configuration later.
About device discovery and connection
We can generate a random ID directly. And use ssh port-forwarding to connect to the pod's syncthing ports. We don't need service discovery at all I think. Since the target and source is deterministic here
When should I install syncthing? On boostrap? On run?
If you mean the binary install, bootstrap will be better. Or we need to add complex logic in attach
And I am not sure if we should use synching or https://github.com/rclone/rclone
XML config looks weird to me.
And I am not sure if we should use synching or https://github.com/rclone/rclone
XML config looks weird to me.
You don't really have to work with the xml other than writing up the default. If you want to make changes to the configuration, you can also use the go struct (here)[https://pkg.go.dev/github.com/syncthing/[email protected]/lib/config#Configuration] so you don't have to work with the xml directly. Ksync and okteto both use syncthing but i can look into rclone a bit more.
Design Document
Envd-server file sync functionality
Description
When a pod/environment is provisioned after envd up --image <image-name>, the user's project directory and it's files are synced into the container in the development pod. When the files are modified either locally or within the container, the changes are synced to keep project files consistent.
Functional Requirements
- Syncs file between project directory and remote container
Non-functional Requirements
- TBD (Sync interval, latency requirements, security etc.)
Implementation
Syncing
The core sync functionality is implemented by using syncthing, which can sync files between two devices.
Syncthing on Local
The syncthing binary is downloaded based on the user's os and architecture, it is installed on envd bootstrap and executed on envd-up. Before the syncthing is executed, we write a config.xml file to the syncthing home directory so that syncthing can read in the configuration on startup. When the syncthing binary is executed, we configure the home (config) directory to be in .config/envd/syncthing as to not interfere with the user's own syncthing configurations. After the binary is executed, a local instance of syncthing starts running and we can start connecting it to the remote instance.
Syncthing on Kubernetes
For syncthing in the kubernetes pod, we use an image of syncthing here to start it up as a container. In terms of the starting configuration, we send in the config.xml file via kubernetes configmaps. However, there is a caveat that configmaps are read only but we can bypass this by mounting the configmap to a temporary directory and using a container lifecycle event on container start to copy the file into the correct directory, which for this syncthing image is /config.
Working with Syncthing
In order to make changes or get information from the syncthing application (to add devices, add folders, check on status, etc.), we use the syncthing rest api. To communicate with the syncthing instance on kubernetes, the appropriate port needs to be forwarded.
The two syncthing instances also need to be discoverable by each other. This can be done through ssh tunnel port forwarding. (I've only tested discovery with the kubernetes cluster and local syncthing instance on the same network/computer so I'm not sure if it'll be different if the kubernetes cluster is on another instance)
Waiting for Events
Since most interactions with syncthing are asynchronous, we need to wait for operations to complete before proceeding. As a few examples, when the binary is executed, you need to wait for the syncthing application to start up to start calling the rest api, when configurations are applied via the rest api, syncthing returns a response immediately and you have to wait for the changes to actually be applied. Some other example asynchronous operations are when files are being scanned or folders are being synced.
Therefore, for asynchronous operations that need to be awaited, there are Wait functions that queries the syncthing rest api on the status of the operation.
Design Choices
Syncthing configurations
For the other kubernetes file sync implementations that I referenced (okteto, ksync) both use xml files to initialize the configuration. However, I chose to not work with xml files to make code more readable, maintainable, consistent and to also keep the configurations closer to the application code. I chose to move the complexity from the build code to the application code (config logic with xml files via Dockerfile vs. with structs in go).
Connecting Two Devices
Syncthing's deviceIDs are generated deterministically from the priv/pub keys. For okteto's implementation, the deviceIDs and priv/pub keys are hard coded. However, for my implementation, I let syncthing autogenerate the deviceID and priv/pub keys and use the API to query the deviceID, and configure the file sync. Hopefully this will prove to be more flexible and extendable in the future.
Will there be a syncthing process in the local host?
Will there be a syncthing process in the local host?
Yes. The process is started with cmd := exec.Command(GetSyncthingBinPath(), "-no-browser", "-no-restart", "-home", s.HomeDirectory) and the cmd object is kept in memory in the Syncthing struct so we can use it to manage the process later.
Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.
For local development, I'm not sure if it's better to use mount instead of sync.
Some corner cases:
- file size limitation (users may accidentally add a model file to the folder)
- follow the soft link or not?
Cool. Then when will the process terminate? I think we can terminate it when users stop the ssh connection.
Sure, or also when the environment is destroyed. I'm almost done with functionalities but haven't actually put the code in the cli commands yet. We can discuss more details after the core sync functionalities are finished!
For local development, I'm not sure if it's better to use
mountinstead ofsync.Some corner cases:
- file size limitation (users may accidentally add a model file to the folder)
- follow the soft link or not?
There can be an option to configure ignore files/folder but the case when you accidentally drop a large file into the folder can definitely be problematic. Potential solutions could be halt if a large file is detected, or ignore large files. What are you suggesting with mount?
What are you suggesting with mount?
By default, we will mount the current working directory. I think it's not necessary to sync the files in this dir.