conduit icon indicating copy to clipboard operation
conduit copied to clipboard

Feature: Ability to update pipelines provisioned by config files

Open nickchomey opened this issue 1 year ago • 8 comments

Feature description

It is much easier/faster to define a pipeline file config files than to do so via the Web UI (too many clicks, and various bugs like buttons sporadically not working) or HTTP API (would require setting up scripts to run all the http requests and their varied payloads etc...). However, file configurations are immutable, so you have to restart the Conduit server in order to change them.

It would be great if we could update the config file and the changes would get reflected immediately.

I dont know what limitations there are with this compared to the HTTP API, but perhaps fsnotify could be used to monitor the pipelines and maybe even the connectors directories?

nickchomey avatar Oct 02 '24 18:10 nickchomey

Also, I've just learned that the web UI was deprecated. All the more reason to allow for something like this!

nickchomey avatar Oct 07 '24 13:10 nickchomey

This feature would be great to have, but we currently have higher priority items on our roadmap. Although, if you wanted to add this feature we are happy to assist you with a review!

lovromazgon avatar Oct 07 '24 15:10 lovromazgon

Right now its more of an annoyance than anything. I'll let you know if it becomes a sufficiently large pain point for me to try to implement this feature!

nickchomey avatar Oct 07 '24 19:10 nickchomey

My thought was that it would just watch the config files and when a change is detected, reload the config file as if it was a fresh startup of Conduit - no diffing necessary.

Surely this can be done without restarting conduit altogether - it could probably leverage whatever is used by the API to do stop/start/etc...

What do you figure diffing would be needed for?

nickchomey avatar Oct 11 '24 22:10 nickchomey

Again, while a diff might be standard, is it necessary? Do you see any downsides to what I've proposed above? It seems quite easy to implement.

Moreover, it's not even evident that any zero downtime diffing is even possible within conduit - it may very well just need a restart, as I've proposed. In that case, is there any reason to do some complicated diff vs just a full reload of the pipeline config?

nickchomey avatar Oct 16 '24 14:10 nickchomey

The way we treat this issue when Conduit starts is to do a diff between the pipeline Conduit finds in its store (e.g. badger DB) and the pipeline in the config file. The assumption is that the config file is the source of truth, so if an entity (source, destination, processor, pipeline) can't be found in the file we delete it, if a new entity is found we create it, if an existing one is changed we update it. We match entities using the IDs, so let's say changing the ID of a source in the pipeline config file will result in the existing source being deleted and a new one being created instead, consequently the pipeline will start from scratch because the position state will be removed together with the source (as explained here).

That's the same behavior I'd expect to see if we implement hot-reloads of pipeline config files. In my opinion, Conduit shouldn't bother with a diff or user guided reconciliation, that sounds like something that can be handled separately by establishing proper deploy procedures, using git and separate deploy environments to ensure a config works correctly before it hits production. We need to see Conduit as the low-level tool that it is, focused on moving data. So I'm more for detecting file changes and treating them as the new source of truth, Conduit then stops the existing pipeline, applies the changes and restarts the pipeline.

lovromazgon avatar Oct 17 '24 11:10 lovromazgon