stac-fastapi-pgstac doesn't work with beta version of postgres 17 (postgis/postgis:17beta3-master)
fast api doesn't load items, even if data is loaded successful to pgstac deployed on postgres 17:
Deployment method:
#!/bin/bash
# Update the package list
apt-get update
# Install nano
apt install -y nano
# Install required packages with automatic yes
apt-get install -y ca-certificates curl
# Create the directory for Docker keyrings
install -y -m 0755 -d /etc/apt/keyrings
# Download the Docker GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
# Ensure the keyring file is readable
chmod a+r /etc/apt/keyrings/docker.asc
# Add Docker repository to APT sources
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update the package list again
apt-get -y update
# Install Docker components with automatic yes
apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Install pypgstac
python3 -m pip install pypgstac[psycopg]
# Export initial env variables:
# for pypgstsac
export POSTGRES_USER=
export POSTGRES_PASSWORD=
export POSTGRES_DB=
export PGUSER=
export PGPASSWORD=
export PGDATABASE=
export PGHOST=0.0.0.0
export PGPORT=5432
# for docker
export postgres_ram="2g"
export postgres_cpus=2
export POSTGRES_IMAGE="postgis/postgis:17beta3-master"
# let's make sure that we only need to export those only once
echo "export POSTGRES_USER=${POSTGRES_USER}" >> /home/eouser/.bashrc
echo "export POSTGRES_PASSWORD=${POSTGRES_PASSWORD}" >> /home/eouser/.bashrc
echo "export POSTGRES_DB=${POSTGRES_DB}" >> /home/eouser/.bashrc
echo "export PGUSER=${PGUSER}" >> /home/eouser/.bashrc
echo "export PGPASSWORD=${PGPASSWORD}" >> /home/eouser/.bashrc
echo "export PGDATABASE=${PGDATABASE}" >> /home/eouser/.bashrc
echo "export PGHOST=${PGHOST}" >> /home/eouser/.bashrc
echo "export PGPORT=${PGPORT}" >> /home/eouser/.bashrc
# Run pgstac via installed docker:
docker run -d \
--name stac-db \
--restart unless-stopped \
--memory="${postgres_ram}" \
--cpus="${postgres_cpus}" \
-e POSTGRES_USER=${POSTGRES_USER} \
-e POSTGRES_PASSWORD=${POSTGRES_PASSWORD} \
-e POSTGRES_DB=${POSTGRES_DB} \
-e PGUSER=${PGUSER} \
-e PGPASSWORD=${PGPASSWORD} \
-e PGDATABASE=${PGDATABASE} \
-p 5432:5432 \
-v stac-db-data:/var/lib/postgresql/data \
${POSTGRES_IMAGE}
# Base migrations install PgSTAC into a database with no current PgSTAC installation
pypgstac migrate
pypgstac load collections https://s3.fra1-2.cloudferro.com/swift/v1/stac-demo/collection-sentinel-2-l1c.json
pypgstac load items https://s3.fra1-2.cloudferro.com/swift/v1/stac-demo/S2B_MSIL1C_20240401T003159_N0510_R002_T11XMK_20240401T003828.json
When using postgis/postgis:16-master everything works like a charm, when using postgres 17, I needed to updat /usr/local/lib/python3.10/dist-packages/pypgstac/db.py:
@property
def pg_version(self) -> str:
"""Get the current pg version number from a pgstac database."""
version = self.query_one(
"""
SHOW server_version;
""",
)
logger.debug(f"PG VERSION: {version}.")
if isinstance(version, bytes):
version = version.decode()
if isinstance(version, str):
try:
if int(version.split(".")[0]) < 13:
raise Exception("PgSTAC requires PostgreSQL 13+")
return version
except ValueError:
print("Warning: Developer version of PostgreSQL detected")
pass
else:
if self.connection is not None:
self.connection.rollback()
raise Exception("Could not find PG version.")
Then everything seemed fine but after trying to access collection sentinel-2-l1c item response returned 0 features. On postgres 16 there is no such problem.
maybe that's a pgstac issue
cc @bitner
Yes, that was my first thought, but then I checked that everything is fine from the side of loading and splitting of input data between tables in the database ;/
@MathewNWSH I see you're using the latest version of pgstac. can you try with python -m pip install "pypgstac[psycopg]==0.8.5"
@vincentsarago
unfortunately response remains the same:
{
"type": "FeatureCollection",
"context": {
"limit": 10,
"returned": 0
},
"features": [],
"links": [
{
"rel": "collection",
"type": "application/json",
"href": "http://***/collections/sentinel-2-l1c"
},
{
"rel": "parent",
"type": "application/json",
"href": "http://***/collections/sentinel-2-l1c"
},
{
"rel": "root",
"type": "application/json",
"href": "http://***/"
},
{
"rel": "self",
"type": "application/geo+json",
"href": "http://***/collections/sentinel-2-l1c/items"
}
]
}
@MathewNWSH are you sure that you're getting stuff ingested in the pgstac schema?
I'm seeing some errors in the logs
db17-1 | 2024-09-17 13:00:43.159 UTC [138] ERROR: relation "pgstac.migrations" does not exist at character 38
db17-1 | 2024-09-17 13:00:43.159 UTC [138] STATEMENT:
db17-1 | SELECT version from pgstac.migrations
db17-1 | order by datetime desc, version desc limit 1;
db17-1 |
Hey @vincentsarago I checked. I see the same error as you, but after running the load command, the data appears in the database. In fact, all the pgstac tables appear.
2024-09-19 08:44:00.980 UTC [91] ERROR: relation "pgstac.migrations" does not exist at character 38
2024-09-19 08:44:00.980 UTC [91] STATEMENT:
SELECT version from pgstac.migrations
order by datetime desc, version desc limit 1;
After trying to delete collection directly from pgadmin I see:
My first guess would be that the search_path is not getting set correctly when connecting to the database. Can you try setting the search_path as a setting on the role that you are logging in as? ALTER ROLE <role you are logging in as> SET search_path to pgstac, public;
Hey @bitner :)
as you suggested I tried:
ALTER ROLE <role you are logging in as> SET search_path to pgstac, public;
and it helped in terms of inserting the data into / editing data in pgstac. So no such error is received:
But a response remains the same:
{
"type": "FeatureCollection",
"links": [
{
"rel": "collection",
"type": "application/json",
"href": "https://stac-cdse.eu/collections/sentinel-2-l2a"
},
{
"rel": "parent",
"type": "application/json",
"href": "https://stac-cdse.eu/collections/sentinel-2-l2a"
},
{
"rel": "root",
"type": "application/json",
"href": "https://stac-cdse.eu/"
},
{
"rel": "self",
"type": "application/geo+json",
"href": "https://stac-cdse.eu/collections/sentinel-2-l2a/items"
}
],
"features": [],
"numberReturned": 0
}
The data is alright - items and collection work perfectly on postgres 16.
@bitner @vincentsarago
I also tested other versions of stac-fastAPI (2.4.11), and the problem occurs there as well. I guess it's only a pgstac problem.
It is confirmed by using pgstac's search function which returns 0 features:
test data:
pypgstac load collections https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/collection-sentinel-2-l2a.json
pypgstac load items https://s3.fra1-2.cloudferro.com/swift/v1/poland-stac/poland-data.json
search function:
select * from search('{"collections": ["sentinel-2-l2a"], "limit": 1000, "conf": {"nohydrate": false}, "fields": {"include": [], "exclude": []}, "filter-lang": "cql-json"}');
the result:
{
"type": "FeatureCollection",
"links": [
{
"rel": "root",
"href": ".",
"type": "application/json"
},
{
"rel": "self",
"href": "./search",
"type": "application/json"
}
],
"features": [],
"numberReturned": 0
}
The issue is the materialized view partition_steps
Select from CREATE VIEW returns proper partition name items_4, but in the view I can see schema name attached _pgstac.items_4, values doesn't match and the join in chunker function between t and partition_steps returns no rows as t.p = name cannot match due to schema on one side of the query. I enforced good name by getting the partition name after the last dot and it started to work.
Here is the fix:
CREATE MATERIALIZED VIEW IF NOT EXISTS pgstac.partition_steps
TABLESPACE pg_default
AS
SELECT split_part(partition, '.', -1) AS name,
date_trunc('month'::text, lower(partition_dtrange)) AS sdate,
date_trunc('month'::text, upper(partition_dtrange)) + '1 mon'::interval AS edate
FROM partitions_view
WHERE partition_dtrange IS NOT NULL AND partition_dtrange <> 'empty'::tstzrange
ORDER BY dtrange
WITH DATA;
as this is a pgstac issue, I've transferred the issue here 🙏