pgloader
pgloader copied to clipboard
Sqlite to postgres load killed
I am using pgloader version "3.6.1" compiled with SBCL 1.4.0-1.el7 to import the Who's On First postcode sqlite database which is stored locally into "PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit" on an Amazon Aurora Serverless instance.
The first two tables (ancestors, concordances) and the last (spr) loaded perfectly, but I when I try to load geojson, pgloader does some things for about 3-5 mins and then is simply terminated with the message "Killed".
I have tried this both with the command line invocation
sudo pgloader --with "batch size = 10MB" /media/.../whosonfirst-data-postalcode-latest.db postgresql://postgres:PASSWORD@database-1.cluster-SERVERNAME.eu-west-1.rds.amazonaws.com:5432/DBNAME
and also with an options file, where I played with various WITH
options, including this one here
load database
from sqlite:///media/.../whosonfirst-data-postalcode-latest.db
into postgresql://postgres:PASSWORD@database-1.cluster-SERVERNAME.eu-west-1.rds.amazonaws.com:5432/DBNAME
with batch size = 10MB, prefetch rows = 5000, create no indexes, concurrency = 1;
Here is the --debug
output from that options file run:
[vagrant@vagrant ~]$ sudo pgloader --debug loadpc.txt
pgloader version 3.6.1
compiled with SBCL 1.4.0-1.el7
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-11-29T11:26:11.066000Z NOTICE Starting pgloader, log system is ready.
2020-11-29T11:26:11.157000Z INFO Starting monitor
2020-11-29T11:26:11.394000Z LOG pgloader version "3.6.1"
2020-11-29T11:26:11.412000Z INFO Parsed command:
load database
from sqlite:///media/.../whosonfirst-data-postalcode-latest.db
into postgresql://postgres:[email protected]:5432/DBNAME
including only table names like 'geojson'
with batch size = 10MB, prefetch rows = 5000, create no indexes, concurrency = 1;
2020-11-29T11:26:12.181000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:12.181000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:12.271000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:12.587000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///media/.../whosonfirst-data-postalcode-latest.db {10065349E3}>
2020-11-29T11:26:12.587000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:12.643000Z DEBUG CONNECTED TO /media/.../whosonfirst-data-postalcode-latest.db
2020-11-29T11:26:12.644000Z SQL SQLite: SELECT tbl_name
FROM sqlite_master
WHERE tbl_name = 'sqlite_sequence'
2020-11-29T11:26:12.682000Z SQL SELECT tbl_name
FROM sqlite_master
WHERE type='table'
AND tbl_name <> 'sqlite_sequence'
2020-11-29T11:26:12.756000Z INFO Processing source catalogs
2020-11-29T11:26:12.793000Z NOTICE Prepare PostgreSQL database.
2020-11-29T11:26:13.084000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:13.084000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:13.189000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:13.570000Z DEBUG BEGIN
2020-11-29T11:26:14.137000Z SQL DROP TABLE IF EXISTS geojson CASCADE;
2020-11-29T11:26:14.418000Z SQL CREATE TABLE geojson
(
id bigint,
body text,
source text,
alt_label text,
is_alt boolean,
lastmodified bigint
);
2020-11-29T11:26:15.167000Z SQL -- params: table-names
select n, n::regclass::oid
from (values ('geojson')) as t(n);
2020-11-29T11:26:15.421000Z NOTICE COPY geojson
2020-11-29T11:26:15.445000Z INFO COPY ON ERROR STOP
2020-11-29T11:26:15.445000Z DEBUG Reader started for geojson
2020-11-29T11:26:15.502000Z DEBUG start geojson 4527
2020-11-29T11:26:15.503000Z SQL SQLite: SELECT `id`, `body`, `source`, `alt_label`, `is_alt`, `lastmodified` FROM `geojson`;
2020-11-29T11:26:15.524000Z DEBUG CONNECTED TO /media/.../whosonfirst-data-postalcode-latest.db
2020-11-29T11:26:15.647000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006C7F183}>
2020-11-29T11:26:15.648000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:15.819000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:15.946000Z INFO pgsql:copy-rows-from-queue[0]: geojson (id body source alt_label is_alt
lastmodified)
Killed
In /var/log/messages
I found
Nov 29 17:02:58 vagrant kernel: Out of memory: Kill process 9699 (pgloader) score 581 or sacrifice child
Nov 29 17:02:58 vagrant kernel: Killed process 9699 (pgloader), UID 0, total-vm:4842572kB, anon-rss:397596kB, file-rss:192kB, shmem-rss:0kB
I also tried with the following
with batch size = 10MB, batch rows = 1000, create no indexes, concurrency = 1
set maintenance_work_mem to '64MB', work_mem to '4MB';
and even batch rows = 1
and prefetch rows = 1
, and no luck.
@poshest did you ever find out what was killing your process?
@ericadams no, I never ended uploading the geojson file. The data in the other WOF files together with other sources we used proved sufficient. I have since left the project.