clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Support custom UDF

Open subkanthi opened this issue 2 years ago • 2 comments

Tried to mount two files to support custom UDF's in clickhouse-operator https://chowdera.com/2022/03/202203311616432455.html, see below the yaml file.

Couple of issues,

  1. The UDF code(Python file) looks like has to be mounted in /var/lib/clickhouse/user_scripts, can't find an example to do that.
  2. For some reason the user_defined_executable_functions_config is not merged into config.xml.
  3. This is the stack trace of the error
Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/config.d_custom_function.xml'.
2022.06.10 20:08:46.613392 [ 1 ] {} <Error> ExternalUserDefinedExecutableFunctionsLoader: Failed to load config file 
'/etc/clickhouse-server/config.d/custom_function.xml': Poco::Exception. Code: 1000, e.code() = 0, Not found: functions.name, Stack trace (when copying this message, always include the lines below):

0. Poco::NotFoundException::NotFoundException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) @ 0x1b2175ac in /usr/bin/clickhouse
1. Poco::Util::AbstractConfiguration::getString(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0x1b0c66e4 in /usr/bin/clickhouse
2. DB::ExternalLoader::LoadablesConfigReader::readFileInfo(DB::ExternalLoader::LoadablesConfigReader::FileInfo&, DB::IExternalLoaderConfigRepository&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0x158fc058 in /usr/bin/clickhouse
3. DB::ExternalLoader::LoadablesConfigReader::readRepositories(std::__1::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&, std::__1::optional<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&) @ 0x158f8329 in /usr/bin/clickhouse
4. DB::ExternalLoader::LoadablesConfigReader::read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) @ 0x158ec6b8 in /usr/bin/clickhouse
5. DB::ExternalLoader::reloadConfig(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const @ 0x158eb136 in /usr/bin/clickhouse
6. DB::Context::loadOrReloadUserDefinedExecutableFunctions(Poco::Util::AbstractConfiguration const&) @ 0x15798fde in /usr/bin/clickhouse
7. DB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) @ 0xb51f9a8 in /usr/bin/clickhouse
8. Poco::Util::Application::run() @ 0x1b0d3ec6 in /usr/bin/clickhouse
9. DB::Server::run() @ 0xb50bd94 in /usr/bin/clickhouse
10. mainEntryClickHouseServer(int, char**) @ 0xb509407 in /usr/bin/clickhouse
11. main @ 0xb489d5a in /usr/bin/clickhouse
12. __libc_start_main @ 0x7f97f69a4083 in ?
13. _start @ 0xb25aa2e in /usr/bin/clickhouse
 (version 22.5.1.2079 (official build))
kind: "ClickHouseInstallation"
metadata:
  name: "settings-01"
spec:
  configuration:
    settings:
      compression/case/method: zstd
      user_defined_executable_functions_config: config.d/*_function.xml
    files:
      config.d/custom_function.xml: |
        <clickhouse>
          <functions>
            <function>
              <type>executable</type>
              <name>timestamp_from_bson</name>
              <return_type>UInt64</return_type>
              <argument>
                <type>String</type>
                <name>value</name>
              </argument>
              <format>TabSeparated</format>
              <command>python3 /etc/clickhouse-server/config.d/timestamp_from_bson.py</command>
            </function>
          </functions>
        </clickhouse>
      config.d/timestamp_from_bson.py: |
        #!/usr/bin/python3
        import sys
        import json

        if __name__ == '__main__':
            for line in sys.stdin:
                dict = json.loads(line)
                ls = []
                for v in dict.values():
                    ls.insert(1, list(v))

                vector1 = tuple(ls[0])
                vector2 = tuple(ls[1])

                v = sum(p * q for p, q in zip(vector1, vector2))
                data = {'result': str(v)}

                print(json.dumps(data), end='\n')
                sys.stdout.flush()
    clusters:
      - name: "standard"
        layout:
          shardsCount: 1
          replicasCount: 1

subkanthi avatar Jun 10 '22 20:06 subkanthi

  1. The XML files for functions are not a part of server config.xml.
  2. By default user_defined_executable_functions_config is /etc/clickhouse-server/*_functoin.xml. However clickhouse operator tries to merge any XMLs at the path into server config.xml. I think chop can use some improvement.
  3. I created a config volume and mounted at /configs. Set the search path /configs/*_function.xml to user_defined_executable_functions_config to avoid the problem.
  4. For UDF scripts I created another volume, and mounted at /user_scripts, set the path to user_scripts_path

So here is a working toy example of deployment yaml. My UDF "cat" calls /bin/cat command that echos the input. For production, use initContainers and checkout git and mount, instead of confgMaps in the yaml.

apiVersion: v1
kind: ConfigMap
metadata:
  name: udf-configmap
data:
  udf_function.xml: |
    <functions>
      <function>
        <type>executable</type>
        <name>cat</name>
        <return_type>String</return_type>
        <argument>
          <type>String</type>
        </argument>
        <format>CSV</format>
        <command>cat</command>
      </function>
    </functions>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: cat-configmap
data:
  cat: |
    #!/bin/bash
    cat
---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "demo-01"
spec:
  configuration:
    settings:
      user_scripts_path: /user_scripts
      user_defined_executable_functions_config: /configs/*_function.xml
    clusters:
      - name: "demo-01"
        templates:
          podTemplate: clickhouse:22.3.6.5
        layout:
          shardsCount: 1
          replicasCount: 1
  templates:
    podTemplates:
      - name: clickhouse:22.3.6.5
        spec:
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server:22.3.6.5
              volumeMounts:
                - name: clickhouse-user-scripts-volume
                  mountPath: /user_scripts
                - name: clickhouse-configs-volume
                  mountPath: /configs
          volumes:
            - name: clickhouse-user-scripts-volume
              configMap:
                name: cat-configmap
                defaultMode: 0755
            - name: clickhouse-configs-volume
              configMap:
                name: udf-configmap
chi-demo-01-demo-01-0-0-0.chi-demo-01-demo-01-0-0.test.svc.cluster.local :) select cat('hi')

SELECT cat('hi')

Query id: 7d8b9cc3-e42e-4d21-a050-1934b4882499

┌─cat('hi')─┐
│ hi        │
└───────────┘

knoguchi avatar Jun 11 '22 03:06 knoguchi

This example works so can we close this one?

https://github.com/Altinity/clickhouse-operator/blob/master/docs/chi-examples/23-udf-example.yaml

lesandie avatar Feb 08 '24 15:02 lesandie