pgstac icon indicating copy to clipboard operation
pgstac copied to clipboard

stac-fastapi-pgstac doesn't work with beta version of postgres 17 (postgis/postgis:17beta3-master)

Open MathewNWSH opened this issue 1 year ago • 12 comments

fast api doesn't load items, even if data is loaded successful to pgstac deployed on postgres 17:

Deployment method:

#!/bin/bash
# Update the package list
apt-get update

# Install nano
apt install -y nano

# Install required packages with automatic yes
apt-get install -y ca-certificates curl

# Create the directory for Docker keyrings
install -y -m 0755 -d /etc/apt/keyrings

# Download the Docker GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc

# Ensure the keyring file is readable
chmod a+r /etc/apt/keyrings/docker.asc

# Add Docker repository to APT sources
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null

# Update the package list again
apt-get -y update

# Install Docker components with automatic yes
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# Install pypgstac
python3 -m pip install pypgstac[psycopg]

# Export initial env variables:
# for pypgstsac
export POSTGRES_USER=
export POSTGRES_PASSWORD=
export POSTGRES_DB=
export PGUSER=
export PGPASSWORD=
export PGDATABASE=

export PGHOST=0.0.0.0
export PGPORT=5432

# for docker
export postgres_ram="2g"
export postgres_cpus=2
export POSTGRES_IMAGE="postgis/postgis:17beta3-master"

# let's make sure that we only need to export those only once
echo "export POSTGRES_USER=${POSTGRES_USER}" >> /home/eouser/.bashrc
echo "export POSTGRES_PASSWORD=${POSTGRES_PASSWORD}" >> /home/eouser/.bashrc
echo "export POSTGRES_DB=${POSTGRES_DB}" >> /home/eouser/.bashrc
echo "export PGUSER=${PGUSER}" >> /home/eouser/.bashrc
echo "export PGPASSWORD=${PGPASSWORD}" >> /home/eouser/.bashrc
echo "export PGDATABASE=${PGDATABASE}" >> /home/eouser/.bashrc
echo "export PGHOST=${PGHOST}" >> /home/eouser/.bashrc
echo "export PGPORT=${PGPORT}" >> /home/eouser/.bashrc

# Run pgstac via installed docker:
docker run -d \
  --name stac-db \
  --restart unless-stopped \
  --memory="${postgres_ram}" \
  --cpus="${postgres_cpus}" \
  -e POSTGRES_USER=${POSTGRES_USER} \
  -e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
  -e POSTGRES_DB=${POSTGRES_DB} \
  -e PGUSER=${PGUSER} \
  -e PGPASSWORD=${PGPASSWORD} \
  -e PGDATABASE=${PGDATABASE} \
  -p 5432:5432 \
  -v stac-db-data:/var/lib/postgresql/data \
  ${POSTGRES_IMAGE}

# Base migrations install PgSTAC into a database with no current PgSTAC installation
pypgstac migrate
pypgstac load collections https://s3.fra1-2.cloudferro.com/swift/v1/stac-demo/collection-sentinel-2-l1c.json
pypgstac load items https://s3.fra1-2.cloudferro.com/swift/v1/stac-demo/S2B_MSIL1C_20240401T003159_N0510_R002_T11XMK_20240401T003828.json

When using postgis/postgis:16-master everything works like a charm, when using postgres 17, I needed to updat /usr/local/lib/python3.10/dist-packages/pypgstac/db.py:

    @property
    def pg_version(self) -> str:
        """Get the current pg version number from a pgstac database."""
        version = self.query_one(
            """
            SHOW server_version;
            """,
        )
        logger.debug(f"PG VERSION: {version}.")
        if isinstance(version, bytes):
            version = version.decode()
        if isinstance(version, str):
            try:
                if int(version.split(".")[0]) < 13:
                    raise Exception("PgSTAC requires PostgreSQL 13+")
                return version
            except ValueError:
                print("Warning: Developer version of PostgreSQL detected")
                pass
        else:
            if self.connection is not None:
                self.connection.rollback()
            raise Exception("Could not find PG version.")

Then everything seemed fine but after trying to access collection sentinel-2-l1c item response returned 0 features. On postgres 16 there is no such problem.

MathewNWSH avatar Sep 17 '24 10:09 MathewNWSH

maybe that's a pgstac issue

cc @bitner

vincentsarago avatar Sep 17 '24 10:09 vincentsarago

maybe that's a pgstac issue

cc @bitner

Yes, that was my first thought, but then I checked that everything is fine from the side of loading and splitting of input data between tables in the database ;/

MathewNWSH avatar Sep 17 '24 10:09 MathewNWSH

@MathewNWSH I see you're using the latest version of pgstac. can you try with python -m pip install "pypgstac[psycopg]==0.8.5"

vincentsarago avatar Sep 17 '24 10:09 vincentsarago

@vincentsarago

unfortunately response remains the same:

{
  "type": "FeatureCollection",
  "context": {
    "limit": 10,
    "returned": 0
  },
  "features": [],
  "links": [
    {
      "rel": "collection",
      "type": "application/json",
      "href": "http://***/collections/sentinel-2-l1c"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "http://***/collections/sentinel-2-l1c"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "http://***/"
    },
    {
      "rel": "self",
      "type": "application/geo+json",
      "href": "http://***/collections/sentinel-2-l1c/items"
    }
  ]
}

MathewNWSH avatar Sep 17 '24 10:09 MathewNWSH

@MathewNWSH are you sure that you're getting stuff ingested in the pgstac schema?

I'm seeing some errors in the logs

db17-1  | 2024-09-17 13:00:43.159 UTC [138] ERROR:  relation "pgstac.migrations" does not exist at character 38
db17-1  | 2024-09-17 13:00:43.159 UTC [138] STATEMENT:  
db17-1  |                       SELECT version from pgstac.migrations
db17-1  |                       order by datetime desc, version desc limit 1;
db17-1  |                       

vincentsarago avatar Sep 17 '24 13:09 vincentsarago

Hey @vincentsarago I checked. I see the same error as you, but after running the load command, the data appears in the database. In fact, all the pgstac tables appear.

2024-09-19 08:44:00.980 UTC [91] ERROR:  relation "pgstac.migrations" does not exist at character 38
2024-09-19 08:44:00.980 UTC [91] STATEMENT:  
	                SELECT version from pgstac.migrations
	                order by datetime desc, version desc limit 1;

image After trying to delete collection directly from pgadmin I see: image

MathewNWSH avatar Sep 19 '24 08:09 MathewNWSH

My first guess would be that the search_path is not getting set correctly when connecting to the database. Can you try setting the search_path as a setting on the role that you are logging in as? ALTER ROLE <role you are logging in as> SET search_path to pgstac, public;

bitner avatar Sep 25 '24 15:09 bitner

Hey @bitner :)

as you suggested I tried: ALTER ROLE <role you are logging in as> SET search_path to pgstac, public;

and it helped in terms of inserting the data into / editing data in pgstac. So no such error is received: image

But a response remains the same:

{
  "type": "FeatureCollection",
  "links": [
    {
      "rel": "collection",
      "type": "application/json",
      "href": "https://stac-cdse.eu/collections/sentinel-2-l2a"
    },
    {
      "rel": "parent",
      "type": "application/json",
      "href": "https://stac-cdse.eu/collections/sentinel-2-l2a"
    },
    {
      "rel": "root",
      "type": "application/json",
      "href": "https://stac-cdse.eu/"
    },
    {
      "rel": "self",
      "type": "application/geo+json",
      "href": "https://stac-cdse.eu/collections/sentinel-2-l2a/items"
    }
  ],
  "features": [],
  "numberReturned": 0
}

The data is alright - items and collection work perfectly on postgres 16.

MathewNWSH avatar Sep 27 '24 09:09 MathewNWSH

@bitner @vincentsarago

I also tested other versions of stac-fastAPI (2.4.11), and the problem occurs there as well. I guess it's only a pgstac problem.

MathewNWSH avatar Sep 27 '24 10:09 MathewNWSH

It is confirmed by using pgstac's search function which returns 0 features:

test data:

pypgstac load collections https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/collection-sentinel-2-l2a.json
pypgstac load items https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/poland-data.json

search function:

select * from search('{"collections": ["sentinel-2-l2a"], "limit": 1000, "conf": {"nohydrate": false}, "fields": {"include": [], "exclude": []}, "filter-lang": "cql-json"}');

the result:

{
  "type": "FeatureCollection",
  "links": [
    {
      "rel": "root",
      "href": ".",
      "type": "application/json"
    },
    {
      "rel": "self",
      "href": "./search",
      "type": "application/json"
    }
  ],
  "features": [],
  "numberReturned": 0
}

MathewNWSH avatar Sep 27 '24 10:09 MathewNWSH

The issue is the materialized view partition_steps

Select from CREATE VIEW returns proper partition name items_4, but in the view I can see schema name attached _pgstac.items_4, values doesn't match and the join in chunker function between t and partition_steps returns no rows as t.p = name cannot match due to schema on one side of the query. I enforced good name by getting the partition name after the last dot and it started to work.

Here is the fix:

CREATE MATERIALIZED VIEW IF NOT EXISTS pgstac.partition_steps
TABLESPACE pg_default
AS
 SELECT split_part(partition, '.', -1) AS name,
    date_trunc('month'::text, lower(partition_dtrange)) AS sdate,
    date_trunc('month'::text, upper(partition_dtrange)) + '1 mon'::interval AS edate
   FROM partitions_view
  WHERE partition_dtrange IS NOT NULL AND partition_dtrange <> 'empty'::tstzrange
  ORDER BY dtrange
WITH DATA;

ckpklos avatar Sep 27 '24 12:09 ckpklos

as this is a pgstac issue, I've transferred the issue here 🙏

vincentsarago avatar Sep 27 '24 12:09 vincentsarago