Feedback for “Syncing External Tables”
We work across databases in Clickhouse and have different clickpipes for different dbs. Based on how db pull currently works it creates all the tables in the specified database in the Moose config file. It looks like there is not a way to specify the database / the db pull command should probably just use the database from the --connection-string. Or It may be helpful if there could be an additional parameter, like --db-name where this is the db you want the external tables to end up in.
Actually, the more I think about this the more I think this might be a bug? If you have an externally managed table(s) from a ClickPipe that is getting loaded into a particular database. And then you establish in your connection string the database, I think it is safe to say we would want that database added to the auto-generated code from the db pull command? Definitely up to hear a counter argument!
hey Tony,
I was looking into this and think I got it, but I am not sure, so I need confirmation that I understand it correctly.
In your setup, are you trying to manage all the Clickhouse Cloud databases with one moose? And is it 1 Cluster with multiple DBs or 1 DB / cluster with multiple clusters?
If you are trying to manage all of those under 1 Moose application, that means:
- You are running
moose pullagainst different Dbs and pulling all the tables in the same codebase - The Table object doesn't currently keep track of databases because we currently use the database field in the config for managing all the tables.
- As such, the tables are not appropriately managed in prod because we are not able to have multiple dbs at the same time?
Or
Is that you are doing 1 moose app/database but when you run moose pull and then moose dev locally then because they are externally managed they don't get provisioned?
Not sure if one of those is it or if I don't get it at all.
I am also happy to get on call to see first hand what you are facing and figure out how to help
Thank you for giving us feedback - that's awesome for us
Let me clarify and give some additional context.
I am looking at using the Moose Stack in order to build some in-app analytics for our application and I am currently in the process of building out a POC for it. We are using Clickhouse Cloud with 1 cluster that has multiple DBs. We have multiple services (1 cluster - multi-db) setup in Clickhouse Cloud, 1 for each environment (prod, stg). Further, we have a couple of clickpipes. These clickpipes basically copy over the databases we need into clickhouse. Each database has a bunch of different tables. Our analytics we are building requires combining data from tables across each of the databases and generating views, materialized views, and tables inside a separate database ( call this database analytics or something like that). Currently, this SQL lives in Clickhouse Cloud and that's not great for a number of reasons (i.e. Version Control).
I am attempting to run moose db pull and set the --file-path for each database's auto-generated schema separately. My assumption would be running moose db pull for a given database would auto-generate the file for the given database. So if my database is called foo and it has a table called bar that I am pulling, the moose app would reference that as foo.bar. However, when you look at the local clickhouse after running moose dev, foo.bar doesn't exist but in my case it is local.bar (because "local" is what is set as my database in the moose.config.toml file).
My hope would be to manage my entire clickhouse environment for all databases in one codebase. This would allow me to model tables, views, build advanced analytics, etc. It would replace the need to use tools like "dbt" or "SQLMesh" to do transformations. It would be our one stop shop. Combining TypeScript, what we use regularly, and SQL. That would be great.
Thanks!
-Tony
That's super clear now, thank you for the additional details.
Like you said, we don't currently track the database for each table; we track them in batches via the config. However, adding the database field to each table to enable that config should not be too hard.
When do you need that by, ideally?
TL;DR : I don't really have a timeline.
I am currently in POC mode and I think I might have a workaround. If you adjust the auto-generated code (even though it says not too) and add the database prefix to the table name, it will create the table in appropriate database. For example, bar table would be:
export const BarTable = new OlapTable<bar>("foo.bar", {
orderByFields: ["id"],
engine: ClickHouseEngines.ReplacingMergeTree,
ver: "XYZ",
settings: { index_granularity: "8192" },
lifeCycle: LifeCycle.EXTERNALLY_MANAGED,
});
Only issue would be in the future if I go to do another db pull I would need to make sure I adjust these prefixes again.
I have started work on it here: https://github.com/514-labs/moosestack/pull/2876/files
Hey @tonytrill , I'm one of the devs at 514. We've also been working on migration flows and would love your input with regards to managing multiple databases. The pr is https://github.com/514-labs/moosestack/pull/2872, there are docs as well to give more context.
The high level flow could be:
- db pull
- make moose changes
- create migration plan
- run migration
The migration command takes a clickhouse url, which includes the db name (e.g. for db whose name is local clickhouse://panda:pandapass@localhost:18123/local).
What would your ideal dx be? Would you want to run each migration command with the different db names? Or would you want the migration to handle all dbs at once?
I would assume running db pull would generate the models for the given database with some sort of database indicator. Then I could run db pull for each of the externally managed databases to generate their schema .ts files. Within the schema files I would assume the OlapTable() class would look something like this:
export const MyFooTable = new OlapTable<my_foo>(
"my_foo",
{
database: "bar",
orderByFields: ["id"],
engine: ClickHouseEngines.ReplacingMergeTree,
ver: "_peerdb_version",
settings: { index_granularity: "8192" },
lifeCycle: LifeCycle.EXTERNALLY_MANAGED,
}
);
I would expect creating and running a migration plan would execute for all dbs listed in all schemas in the codebase