Support all DSN variants for clickhouse DATABASE_URL
Description Hello, the clickhouse driver should support all DSN variants. Currently, I need to pass two connection strings to use dbmate with an application that uses clickhouse HTTP protocol.
- Version: 2.6.0
- Database: clickhouse
- Operating System: Linux
Steps To Reproduce
DATABASE_URL=http://default:changeme@localhost:8123/default
Expected Behavior
HTTP should just work. The clickhouse driver has a clickhouse.ParseDSN function that does the job for all variants. I can provide a PR.
DATABASE_URL=http://default:changeme@localhost:8123/default
I would be in favor of supporting this "experimental" HTTP transport, based on the ClickHouse README change on Jul 8, 2022 and still marked experimental today, as long as the database URL driver is prefixed, for example, clickhouse+http://.
Related discussion: #474.
Well, even if we do add the support for http client for clickhouse, this won't change the fact you will need to use a different DATABASE_URL for dbmate, as we still need a db-specific prefix to map to the correct driver.
@dossy In my opinion, we can just the update the scheme to be "http" instead of "clickhouse" in case the user provided a port of "8123" (http) or 8443 (https). Instead of using a new specific driver prefix.
What do you think?
@dossy In my opinion, we can just the update the scheme to be "http" instead of "clickhouse" in case the user provided a port of "8123" (http) or 8443 (https). Instead of using a new specific driver prefix.
What do you think?
I'm not a fan of the idea of using the port number to infer the transport layer: it could lead to unintended surprises when a user, not aware of this "magical" association, uses a port that suddenly changes the transport layer.
We often use the same port numbers for particular services out of tradition, because most services are not officially registered with IANA. But, ultimately, port numbers are arbitrary: there is nothing technically stopping you from running your database bound to port 1/tcp, aside from some antiquated "binding to privileged ports" restriction which is simply a matter of configuration, or using a port redirection like NAT.
Requiring a user to specify the transport in the URL, e.g., clickhouse:// vs. clickhouse+http:// or clickhouse+https://, is explicit and clear. There is no question as to what database driver, and what transport layer protocol will be used, when looking at the URL.
Now, I think the idea of defaulting the port, if not specified in the URL, depending on the URL scheme, may make sense if there is an expectation that a particular port will be used by default. In other words, specifying clickhouse+http:// as the scheme could default to port 8123 if a port is not specified explicitly in the URL, and clickhouse+https:// default to port 8443 if not explicitly specified in the URL. If those are the two most commonly used and expected ports by ClickHouse users who use the HTTP/HTTPS transports, then I think it makes sense to use those as defaults.
Thanks for the elaborated response, I agree with the reasonings + I think that supplying the default based on scheme is a great idea. 8123 and 8443 are indeed the acceptable ports for http and https by clickhouse as can be read from the docs: https://clickhouse.com/docs/en/guides/sre/network-ports
I think I will try to tackle this one
Agree with @dossy, we cannot register something generic like http:// to Clickhouse. clickhouse+http:// is a good solution and is clear.
If we want to support interop with existing clickhouse connection strings, then we could potentially offer a new optional CLI flag to override the driver. For example something like:
dbmate --url "http://default:changeme@localhost:8123/default" --driver clickhouse up
Thus the clickhouse driver could support both clickhouse+http:// and http://, but only clickhouse+http:// would automatically load the correct driver. I haven't thought through all the implications of this, but I think it might work.
If we want to support interop with existing clickhouse connection strings, then we could potentially offer a new optional CLI flag to override the driver.
I'm in favor of this approach, because ClickHouse users can use the same DATABASE_URL string with code that directly uses the clickhouse-go client or dbmate. Does it make sense to use DBMATE_DRIVER as an environment variable for specifying the driver, too?
Yeah DBMATE_DRIVER env var makes sense too - matches the other supported env vars