kafka-connect-arangodb
kafka-connect-arangodb copied to clipboard
Kafka arangodb fail to create collection if not present in target database
I have below configuration
{
"name": "development-arangodb-connector",
"connector.class": "io.github.jaredpetersen.kafkaconnectarangodb.sink.ArangoDbSinkConnector",
"tasks.max": "1",
"topics": "customers",
"arangodb.host": "192.168.56.1",
"arangodb.port": 8529,
"arangodb.user": "root",
"arangodb.password": "admin",
"transforms": "cdc",
"transforms.cdc.type": "io.github.jaredpetersen.kafkaconnectarangodb.sink.transforms.Cdc", "arangodb.database.name": "development",
"arangodb.batch.size":"100",
"arangodb.max.retries":"10",
"arangodb.writer.impl":"kafka.connect.arangodb.ArangoDBWriter"
}
Issue Synch connector not creating collection on target database if that is not present .
Yup, this is correct behavior. Kafka Connect ArangoDB does not create collections or databases at all -- it's your responsibility to do so beforehand. While it's theoretically possible to figure out what kind of collection you will need (edge collections are the ones with _to and _from fields so look for those fields), there's no way to predict what kind of indices you're going to need. Indices are super important to having a performant database and auto-creating collections enables users to forget about them.
It looks like this isn't documented explicitly like I thought it was. Even the development docs hides the database collection creation away from you. I'll keep this issue open as a reminder to document this.
Thanks for bringing it up!
Thanks for reply on time. Do we have any plan to write Source Kafka connector ? :)
That's definitely one of the next things I'd like to do with this. I've been working converting the existing docker compose development setup to use a clustered ArangoDB via Kubernetes first. The clustered form is much more difficult to use as a source system due to the architecture but starting with that first helps us avoid writing code into a corner.