dbt-clickhouse icon indicating copy to clipboard operation
dbt-clickhouse copied to clipboard

feat: allow to add connection overrides for dictionaries

Open Ian2012 opened this issue 10 months ago • 3 comments

Summary

This PR fixed an issue in which ClickHouse was using the configured host to load the dictionary, however, this would fail if it was running on Docker and would be slower if it was using a URL to connect to ClickHouse. It implements a new config for the dictionary model allowing people to override connection params.

Before

SOURCE(CLICKHOUSE(USER 'ch_admin' PASSWORD '[HIDDEN]' HOST 'clickhouse' PORT '8123' QUERY ...)

After

SOURCE(CLICKHOUSE(USER 'ch_admin' PASSWORD '[HIDDEN]' QUERY ...)

Checklist

Delete items not relevant to your PR:

  • [ ] Unit and integration tests covering the common scenarios were added
  • [ ] A human-readable description of the changes was provided to include in CHANGELOG
  • [ ] For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

Ian2012 avatar Apr 08 '24 20:04 Ian2012

@BentsiLeviav This PR is ready for review

Ian2012 avatar Apr 08 '24 21:04 Ian2012

@BentsiLeviav, I was literally about to create an issue asking for very similar change. Only difference is I think it might be nice to support omitting credentials altogether for the table source? i.e.

SOURCE(CLICKHOUSE(TABLE 'table_name'))

Specifying the user in our case fails in our development environment because developers are not able to directly authenticate to ClickHouse from the server itself. As a result we get an auth error.

rjoelnorgren avatar Apr 08 '24 21:04 rjoelnorgren

I think @rjoelnorgren brings up a good point. Would it be possible to update the credentials dict with connection_overrides and, if a given field is empty, omit that from the configuration?

As an example, here's what I imagine a potential model definition to look like (using the example in the dictionary test):

{{ config(
       materialized='dictionary',
       fields=[
           ('id', 'Int32'),
           ('name', 'String'),
           ('hacker_alias', 'String')
       ],
       primary_key='id',
       layout='COMPLEX_KEY_HASHED()',
       lifetime='1',
       source_type='clickhouse',
       connection_overrides={
           "host": "",
           "port": "",
       },
) }}

select
    id,
    name,
    case
        when name like 'Dade' then 'crash_override'
        when name like 'Kate' then 'acid burn'
        when name like 'Eugene' then 'the plague'
        else 'N/A'
    end as hacker_alias
from {{ source('raw', 'people') }}

and the subsequent dictionary DDL would omit the HOST and PORT sections of the configuration. Does that make sense?

SoryRawyer avatar Apr 09 '24 20:04 SoryRawyer