conduit Feature: Implement ImportPipeline and ExportPipeline functions in the API

Feature description

The API ImportPipeline and ExportPipeline functions appear to not be implemented

But there appear to be import and export mechanisms in the provisioning package -

Is this a bug/oversight? Or is there a reason why this wasn't implemented? Either way, could it be implemented?

Apr 01 '25 01:04 nickchomey

This is not an oversight, it's an unimplemented feature, which we outscoped at a specific point in time, because we painted ourselves a bit into a corner.

At first, Conduit only had an API, no pipeline configuration files. In the API the IDs are automatically created and are globally unique (as you pointed out in that other issue). When we added pipeline configuration files we decided to "enrich" the IDs in the file before provisioning the pipeline. By "enrich" I mean prepending the parent ID in front of the actual ID in the file (for instance, a connector with the ID connID on a pipeline with ID pipID internally gets the ID pipID:connID). This way, we ensure globally unique IDs, allowing our users not to worry about having separate IDs in separate pipeline configuration files.

However, this creates a discrepancy between pipelines created through the API and ones provisioned through pipeline config files. If we exported a pipeline provisioned via the API and loaded it into Conduit again through the config file, Conduit would enrich the config and prefix the IDs, creating different objects than were actually in the original pipeline. Repeat the process a couple of times and you'll get really long IDs.

All this stems from the decision that pipelines, connectors and processors need globally unique IDs. To circumvent this we want to refactor the internals and the API so it matches the pipeline configuration files. The idea is to stack the endpoints to match the config file (e.g. /v1/pipelines/{id}/connectors/{id}) and use composite keys internally for the sub-resources (e.g. a connector is then identified by the pipeline ID + connector ID, similar for processors). This way we don't have to have globally unique IDs, for instance two pipelines could have a connector with the same ID, as the key is actually composed of the pipeline ID + connector ID.

Having this structure would also allow us to export and import pipelines regardless of how they were provisioned.

This is, again, a feature that has been on our mind for some time, but our limited capacity didn't allow us to address this yet.

Apr 02 '25 16:04 lovromazgon

Thanks for the explanation! I like the idea of v1/pipelines/{id}/connectors/{id}.

I suppose what I had in mind here was being able to just import and export an entire pipeline config (including connectors and processors) via the api - probably with json.

Though this gave me an idea to look into with what I'm working on for a nats api (which now seems to be fully working for all of the existing endpoints): perhaps there will be a hacky way to specify a yaml config file without reading from the filesystem (which is what I'm trying to avoid most of all). For example receive a nats (or http, or anything) message that contains a yaml string, and feed it into the mechanism that conduit normally uses to load a pipeline yaml file. I'll explore a bit to see if it seems feasible.

edit: I laid the groundwork for such a thing in #2236 - I'm successfully using it to send yaml strings in via NATS that then hot reload. It could also work with a filewatcher on the yaml pipeline config files. But, perhaps that would become obsolete if you end up implementing the changes described above for import/export, IDs, etc... - I assume a unification like that would be better than keeping the yaml and api somewhat distinct/incompatible.

Apr 02 '25 19:04 nickchomey