postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Is there support to use custom LC_COLLATE and LC_CTYPE?

Open caleno opened this issue 7 years ago • 19 comments

Hi.

Is there support to use custom LC_COLLATE and LC_CTYPE when specifying databases to be created?

It seems the docker image you provide don't support this? I guess you use the PostgreSQL Docker Community library as base for your docker image builds? E.g this: https://github.com/docker-library/postgres/blob/master/9.6/Dockerfile

I've tried to change the encoding in the manifest like this:

... patroni: initdb: encoding: "UTF8" locale: "nb_NO.UTF8" ....

I managed to build a docker image with support for nb_NO.UTF8 by using the Dockerfile referenced over and adding this:

localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8; \ localedef -i nb_NO -f UTF-8 nb_NO.UTF-8 But for what I know patroni uses it own docker registry.

So is there a solution for this?

Let me know I should post this issue in the patroni or spilo repo instead of this.

Br, Lennart

caleno avatar Aug 29 '18 08:08 caleno

Manage to create my own spilo container with nb_NO.UTF8 locale support by editing the Docker.build file.

    && localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8 \
    && localedef -i nb_NO -f UTF-8 nb_NO.UTF-8 \

    # Add PGDG repositories
    && DISTRIB_CODENAME=$(sed -n 's/DISTRIB_CODENAME=//p' /etc/lsb-release) \
    && echo "deb http://apt.postgresql.org/pub/repos/apt/ ${DISTRIB_CODENAME}-pgdg main" > /etc/apt/sources.list.d/pgdg.list \
    && echo "deb-src http://apt.postgresql.org/pub/repos/apt/ ${DISTRIB_CODENAME}-pgdg main" >> /etc/apt/sources.list.d/pgdg.list \
    && curl -s -o - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - \

    && apt-get update \
    && apt-get install -y postgresql-common \

    # forbid creation of a main cluster when package is installed
    && sed -ri 's/#(create_main_cluster) .*$/\1 = false/' /etc/postgresql-common/createcluster.conf \

    # Clean up
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
            /var/cache/debconf/* \
            /usr/share/doc \
            /usr/share/man \
            /usr/share/locale/?? \
            /usr/share/locale/??_?? \
    && find /var/log -type f -exec truncate --size 0 {} \; \
    && find /usr/share/i18n/charmaps/ -type f ! -name UTF-8.gz -delete \
    && find /usr/share/i18n/locales/ -type f ! -name en_US ! -name nb_NO -delete \
    && echo 'en_US.UTF-8 UTF-8' > /usr/share/i18n/SUPPORTED \
    && echo 'nb_NO.UTF-8 UTF-8' >> /usr/share/i18n/SUPPORTED```

Build
```./build.sh --build-arg COMPRESS=true -t uibit/spilo:latest .```

Pushed it to our registry at Docker Hub and then referencing to it in the manifest. 

However, initdb ignores my locale settings or is overruled somewhere. 

``` patroni:
    initdb:
      encoding: "UTF8"
      locale: "nb_NO.UTF8"
      data-checksums: "true"
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    all all 0.0.0.0/0 md5
    ttl: 30
    loop_wait: &loop_wait 10
    retry_timeout: 10
    maximum_lag_on_failover: 33554432
  # restore a Postgres DB with point-in-time-recovery 
  # with a non-empty timestamp, clone from an S3 bucket using the latest backup before the timestamp
  # with an empty/absent timestamp, clone from an existing alive cluster using pg_basebackup
  # clone:
  #  uid: "efd12e58-5786-11e8-b5a7-06148230260c"
  #  cluster: "acid-batman"
  #  timestamp: "2017-12-19T12:40:33+01:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)
  maintenanceWindows:
  - 01:00-06:00 #UTC
  - Sat:00:00-04:00

I could create a database manually with no_NB.UTF8 so at least some progress. postgres=# create database test with template = template0 encoding = 'UTF8' LC_COLLATE = 'nb_NO.UTF8' LC_CTYPE = 'nb_NO.UTF8';

I'd like the option to add parameters to the cluster manifest, e.g. LC_COLLATE and LC_CTYPE

kind: postgresql

metadata:
  name: pg-test-cluster
  namespace: test
spec:
  teamId: "TEST"
  volume:
    size: 1Gi
  numberOfInstances: 1
  users: #Application/Robot users
    test:
    - login
  enableMasterLoadBalancer: true
  enableReplicaLoadBalancer: false
  allowedSourceRanges: # load balancers' source ranges for both master and replica services
  - 127.0.0.1/32
  databases:
    test: test
#Expert section
  postgresql:
    version: "9.6"
    parameters:
      shared_buffers: "32MB"
      max_connections: "10"
      log_statement: "all"
  resources:
    requests:
      cpu: 10m
      memory: 100Mi
    limits:
      cpu: 300m
      memory: 3000Mi
  patroni:
    initdb:
      encoding: "UTF8"
      locale: "nb_NO.UTF8"   <--- ignored
      data-checksums: "true"
    pg_hba:
    - hostssl all all 0.0.0.0/0 md5
    - host    all all 0.0.0.0/0 md5
    ttl: 30
    loop_wait: &loop_wait 10
    retry_timeout: 10
    maximum_lag_on_failover: 33554432
  # restore a Postgres DB with point-in-time-recovery 
  # with a non-empty timestamp, clone from an S3 bucket using the latest backup before the timestamp
  # with an empty/absent timestamp, clone from an existing alive cluster using pg_basebackup
  # clone:
  #  uid: "efd12e58-5786-11e8-b5a7-06148230260c"
  #  cluster: "acid-batman"
  #  timestamp: "2017-12-19T12:40:33+01:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)
  maintenanceWindows:
  - 01:00-06:00 #UTC
  - Sat:00:00-04:00```

Or do I have to hardcode this into the spilo container. Might be the easiest way if there currently in no other option. 

caleno avatar Sep 04 '18 13:09 caleno

@caleno I think what happens here is that initdb is not ignored in the operator, and the operator generates the SPILO_CONFIGURATION environment variable with the correct value (see your statefulset definition for the cluster created by the operator) for bootstrap:initdb:locale; however, Spilo sets it by default to en_US.UTF-8 (https://github.com/zalando/spilo/blob/9b44a5d03074a343e1d3c5b35c79c2a4d85c7e2d/postgres-appliance/scripts/configure_spilo.py#L175), and when merging SPILO_CONFIGURATION with the default one, it appends the default initdb options to the list of custom ones, obtained from the SPILO_CONFIGURATION (initdb options is a list, since some of those doesn't have any values, i.e. --data-checksums). Patroni happily processes this list and puts to --locale options, and the last one (with en_US.UTF-8) wins.

So there are a few things we could do better here:

  • provide a separate locale and encoding variables to Spilo, allowing to override the default settings.
  • Emit a warning in Patroni on multiple initdb options with the same name.
  • Possibly change the way we merge lists in Spilo the initdb case (instead of appending options, we first need to de-duplicate them).

alexeyklyukin avatar Sep 06 '18 19:09 alexeyklyukin

So cool to ask for Norwegian locale :) 🇳🇴

Jan-M avatar Sep 12 '18 08:09 Jan-M

Thanks for your reply @alexeyklyukin. I'll get back to this case later on when I got some more time on my hands. But one thing that would be great regarding any locale support would be something like this.

    test: test
      locale: 
        lc_collate: "nb_NO.UTF8"
        lc_ctype: "nb_NO.UTF8"
#Expert section
  postgresql:

I don't want to make a big issue out of this, see it as a feature, nice to have, request ;)

caleno avatar Sep 13 '18 06:09 caleno

@caleno I think it would be more clear to continue passing those in the intidb section of Patroni configuration, so that we don't invent an additional translation layer between the operator options and Patroni ones. Everything is actually in place but the Spilo support for merging user-specified initdb options in a sane way, but it should be rather easy to fix.

alexeyklyukin avatar Sep 14 '18 16:09 alexeyklyukin

Is there any update about this issue? I really hope to customize LC_COLLATE in postgres-operator.

kimxogus avatar Jan 14 '19 02:01 kimxogus

@alexeyklyukin , any news regarding this issue? I'm actually also looking for a way to use locale: "nb_NO.UTF-8"

paalkr avatar Mar 27 '20 22:03 paalkr

It turns out Matrix needs C.C to even start up: https://github.com/matrix-org/synapse/blob/master/docs/postgres.md#set-up-database

Synapse will refuse to set up a new database if it has the wrong values of COLLATE and CTYPE set, and will log warnings on existing databases.

LC_COLLATE='C'
LC_CTYPE='C'

This is such a great operator for PostgreSQL, I'm so sad that I can't just do in my cluster config:

   patroni:
     initdb:
       encoding: C
       locale: C

immanuelfodor avatar Sep 29 '20 06:09 immanuelfodor

Same setting as @immanuelfodor . Unfortunately stuck with the synapse deployment for matrix.

erhan- avatar Apr 05 '22 11:04 erhan-

I ended up removing the operator and went for an external PG deployment outside of the cluster 😭

immanuelfodor avatar Apr 05 '22 11:04 immanuelfodor

I dropped the database but was not able to create a new one via following command to have the right values:

createdb --encoding=UTF8 --locale=C --template=template0 --owner=matrix matrix;

\l does not about anything and the database is gone.

edit: createdb is not a psql command. You have to do it like this:

your-cluster is the name of your postgres cluster.

kubectl port-forward your-cluster-0 6432:5432

and then in a new terminal:

export PGPASSWORD=$(kubectl get secret postgres.your-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d)
export PGSSLMODE=require

psql -U matrix -d postgres -h localhost -p 6432

and then

drop database matrixdb;
create database matrixdb with template=template0 owner=matrix encoding=UTF8 locale='C';

Works so far.

erhan- avatar Apr 06 '22 17:04 erhan-

Bumping this, it's been open 2 years... Can anyone spare time to look into this?

hbjydev avatar Jun 04 '22 18:06 hbjydev

Exec into the master pod of your cluster in k8s and run the following:

psql
DROP DATABASE "synapse-db";
CREATE DATABASE "synapse-db" with template=template0 owner="synapse-db" encoding=UTF8 locale='C';
GRANT ALL PRIVILEGES ON DATABASE "synapse-db" to "synapse-db";

Obviously, replace the user/role/db name with the appropriate information. You can get those values by running \l to get the db name and \du to get the user/role name.

TheEagleByte avatar Oct 04 '22 15:10 TheEagleByte

This is a common requirement across many services. Can we please get somebody to look into this?

SNThrailkill avatar Nov 29 '22 21:11 SNThrailkill

Does the following not work? That's how we set up our PostgreSQL for Synapse 23 months ago.

  patroni:
    initdb:
      lc-collate: C
      lc-ctype: C
      encoding: UTF8

haslersn avatar Dec 01 '22 17:12 haslersn

To use custom locale you must build a custom spilo image. There is a environment variable ADDITIONAL_LOCALES in Dockerfile.

For example, when you build image like this:

cd postgres-appliance
docker build  --tag $YOUR_TAG --build-arg ADDITIONAL_LOCALES=pl_PL .

Then you can use this locale in postgresql manifest:

patroni:
  initdb:
    encoding: "UTF8"
    locale: "pl_PL.UTF-8"

You can add to image more than one additional language.

fraanek avatar Dec 01 '22 21:12 fraanek

any news on this?

benedikt-bartscher avatar Jan 09 '23 18:01 benedikt-bartscher

add environment LC_ALL= pl_PL.utf8 all work

apristup avatar May 07 '24 11:05 apristup