pgcopydb
pgcopydb copied to clipboard
Refresh materialized views fails during post data pg_restore phase
REFRESH MATERIALIZED VIEWS fail when there is a reference to a object which is not fully qualified and the query is enclosed in a string.
Query to reproduce
CREATE SCHEMA pgcopydb;
CREATE TABLE pgcopydb.test ("Name" text, "LocalTag" text);
CREATE MATERIALIZED VIEW public.test_view AS
SELECT row_number() OVER (PARTITION BY true::boolean) AS row_number,
ts_stat.word
FROM ts_stat('SELECT to_tsvector(''simple'', "Name") ||
to_tsvector(''simple'', coalesce("LocalTag",'' ''))
FROM test'::text) ts_stat(word, ndoc, nentry)
WITH NO DATA;
SET search_path = 'pgcopydb';
REFRESH MATERIALIZED VIEW public.test_view;
Logs
19:26:42 43355 INFO /usr/bin/pg_restore --dbname 'postgres://[email protected]/postgres?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --single-transaction --clean --if-exists --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/post.dump
19:26:48 43355 ERROR pg_restore: while PROCESSING TOC:
19:26:48 43355 ERROR pg_restore: from TOC entry 4279; 0 139077 MATERIALIZED VIEW DATA test_view azureuser
19:26:48 43355 ERROR pg_restore: error: could not execute query: ERROR: relation "test" does not exist
19:26:48 43355 ERROR LINE 3: FROM test
19:26:48 43355 ERROR ^
19:26:48 43355 ERROR QUERY: SELECT to_tsvector('simple', "Name") ||
19:26:48 43355 ERROR to_tsvector('simple', coalesce("LocalTag",' '))
19:26:48 43355 ERROR FROM test
19:26:48 43355 ERROR Command was: REFRESH MATERIALIZED VIEW public.test_view;
19:26:48 43355 ERROR Failed to run pg_restore: exit code 1
19:26:48 43355 ERROR Failed to clone source database, see above for details
19:26:48 43353 ERROR clone process 43355 has terminated [6]
First of all, thanks @dimitri for this great tool!
I ran into a similar problem when migrating from postgres 15.3 to 16.3. I used this simple script:
SOURCE_DB="..."
TARGET_DB="..."
DATA_DIR="/mnt/data/pgcopydb2" # path where I have some free space
mkdir -p "$DATA_DIR"
chown -R 999:999 "$DATA_DIR"
time docker run \
-v "$DATA_DIR:/data" \
--rm -it \
dimitri/pgcopydb:v0.16 \
pgcopydb clone \
--dir /data \
--source "$SOURCE_DB" \
--target "$TARGET_DB" \
--table-jobs=8 \
--index-jobs=2
And the result was as follows (changed URIs to <source-db>
and <target-db>
for privacy):
20:25:38.351 1 INFO Running pgcopydb version 0.16 from "/usr/local/bin/pgcopydb"
20:25:38.352 1 INFO [SOURCE] Copying database from "postgres://<source-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
20:25:38.352 1 INFO [TARGET] Copying database into "postgres://<target-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60"
20:25:38.374 1 INFO Using work dir "/data"
20:25:38.752 1 INFO Exported snapshot "00000003-0001A517-1" from the source database
20:25:38.945 23 INFO STEP 1: fetch source database tables, indexes, and sequences
20:25:42.520 23 INFO Fetched information for 16 tables (including 0 tables split in 0 partitions total), with an estimated total of 173 496 tuples and 2739 MB on-disk
20:25:42.587 23 INFO Fetched information for 53 indexes (supporting 27 constraints)
20:25:42.603 23 INFO Fetching information for 14 sequences
20:25:43.235 23 INFO Fetched information for 4 extensions
20:25:43.926 23 INFO Found 0 indexes (supporting 0 constraints) in the target database
20:25:43.940 23 INFO STEP 2: dump the source database schema (pre/post data)
20:25:43.947 23 INFO /usr/bin/pg_dump -Fc --snapshot 00000003-0001A517-1 --section pre-data --file /data/schema/pre.dump 'postgres://<source-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
20:25:46.584 23 INFO /usr/bin/pg_dump -Fc --snapshot 00000003-0001A517-1 --section post-data --file /data/schema/post.dump 'postgres://<source-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60'
20:25:48.617 23 INFO STEP 3: restore the pre-data section to the target database
20:25:48.704 23 INFO /usr/bin/pg_restore --dbname '<target-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /data/schema/pre-filtered.list /data/schema/pre.dump
20:25:50.119 31 INFO STEP 4: starting 8 table-data COPY processes
20:25:50.287 34 INFO STEP 8: starting 8 VACUUM processes
20:25:50.356 32 INFO STEP 6: starting 2 CREATE INDEX processes
20:25:50.356 32 INFO STEP 7: constraints are built by the CREATE INDEX processes
20:25:50.605 23 INFO Skipping large objects: none found.
20:25:50.638 23 INFO STEP 9: reset sequences values
20:25:50.639 53 INFO Set sequences values on the target database
20:35:29.877 23 INFO STEP 10: restore the post-data section to the target database
20:35:31.031 23 INFO /usr/bin/pg_restore --dbname 'postgres://<target-db>?keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 2 --use-list /data/schema/post-filtered.list /data/schema/post.dump
20:35:32.711 23 ERROR pg_restore: error: could not execute query: ERROR: relation "documents_document" does not exist
20:35:32.711 23 ERROR LINE 1: SELECT search_data FROM documents_document
20:35:32.711 23 ERROR ^
20:35:32.711 23 ERROR QUERY: SELECT search_data FROM documents_document
20:35:32.712 23 ERROR Command was: REFRESH MATERIALIZED VIEW public.documents_word;
20:35:32.774 23 WARN pg_restore: warning: errors ignored on restore: 2
20:35:32.774 23 ERROR Failed to run pg_restore: exit code 1
20:35:32.774 23 ERROR Failed to finalize schema on the target database, see above for details
20:35:32.774 23 ERROR Failed to clone source database, see above for details
20:35:32.893 1 ERROR clone process 23 has terminated [6]
real 10m3.633s
user 0m0.014s
sys 0m0.014s
I successfully migrated another database (which also had one matview) from 14.5 to 16.3 using the exact same procedure.
I'm not sure if this helps, help tried grep
for documents_word
(the matview that caused the error) on pgcopydb data path and found this:
# grep -rHni documents_word /mnt/data/pgcopydb2
/mnt/data/pgcopydb2/schema/post-out.list:59:3410; 1259 699431 INDEX public documents_word_word_idx postgres
/mnt/data/pgcopydb2/schema/post-out.list:60:3411; 1259 699433 INDEX public documents_word_word_idx1 postgres
/mnt/data/pgcopydb2/schema/post-out.list:90:3605; 0 699343 MATERIALIZED VIEW DATA public documents_word postgres
grep: /mnt/data/pgcopydb2/schema/pre.dump: binary file matches
/mnt/data/pgcopydb2/schema/pre-filtered.list:30:248; 1259 699343 MATERIALIZED VIEW public documents_word postgres
/mnt/data/pgcopydb2/schema/post-filtered.list:44:3410; 1259 699431 INDEX public documents_word_word_idx postgres
/mnt/data/pgcopydb2/schema/post-filtered.list:45:3411; 1259 699433 INDEX public documents_word_word_idx1 postgres
/mnt/data/pgcopydb2/schema/post-filtered.list:75:3605; 0 699343 MATERIALIZED VIEW DATA public documents_word postgres
grep: /mnt/data/pgcopydb2/schema/post.dump: binary file matches
/mnt/data/pgcopydb2/schema/pre-out.list:45:248; 1259 699343 MATERIALIZED VIEW public documents_word postgres
I can share these files if needed.
Since this matview was not used much, I decided to delete it from the source database, created a new target server and run pgcopydb clone
command again. This time it worked successfully! I can recreate the matview manually on the target database later.
I believe this issue is partially related to PR #501 ("the first step towards support for REFRESHing materialized views within pgcopydb instead of in pg_restore"). A possible solution could be rearranging the order of operations in list files: a quick fix would be to always copy matviews and its indexes last and a smarter approach would be to detect the table dependencies for each matview and run them only after the dependent tables have been imported.
I tried again with a fresh target database and the options --table-jobs=1 --index-jobs=1 --restore-jobs=1
to check if reducing parallelism might resolve the issue, but the same error occurred.
Hello @turicas
Can you check whether the definition of your materialized view documents_word
contain only qualified references to documents_document
relation? If so, I suggest you fix the definition by qualifying them with schema names.
If there exists an unqualified reference to documents_document
relation, pg_restore may fail. This is an intentional failure on pg_restore as allowing object lookups along the search path is prone to abuse.