pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

Sqlite to postgres load killed

Open poshest opened this issue 4 years ago • 2 comments

I am using pgloader version "3.6.1" compiled with SBCL 1.4.0-1.el7 to import the Who's On First postcode sqlite database which is stored locally into "PostgreSQL 11.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.9.3, 64-bit" on an Amazon Aurora Serverless instance.

The first two tables (ancestors, concordances) and the last (spr) loaded perfectly, but I when I try to load geojson, pgloader does some things for about 3-5 mins and then is simply terminated with the message "Killed".

I have tried this both with the command line invocation

sudo pgloader --with "batch size = 10MB" /media/.../whosonfirst-data-postalcode-latest.db postgresql://postgres:PASSWORD@database-1.cluster-SERVERNAME.eu-west-1.rds.amazonaws.com:5432/DBNAME

and also with an options file, where I played with various WITH options, including this one here

load database
     from sqlite:///media/.../whosonfirst-data-postalcode-latest.db
     into postgresql://postgres:PASSWORD@database-1.cluster-SERVERNAME.eu-west-1.rds.amazonaws.com:5432/DBNAME

  with batch size = 10MB, prefetch rows = 5000, create no indexes, concurrency = 1;

Here is the --debug output from that options file run:

[vagrant@vagrant ~]$ sudo pgloader --debug loadpc.txt
pgloader version 3.6.1
compiled with SBCL 1.4.0-1.el7
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-11-29T11:26:11.066000Z NOTICE Starting pgloader, log system is ready.
2020-11-29T11:26:11.157000Z INFO Starting monitor
2020-11-29T11:26:11.394000Z LOG pgloader version "3.6.1"
2020-11-29T11:26:11.412000Z INFO Parsed command:
load database
     from sqlite:///media/.../whosonfirst-data-postalcode-latest.db
     into postgresql://postgres:[email protected]:5432/DBNAME

  including only table names like 'geojson'
  with batch size = 10MB, prefetch rows = 5000, create no indexes, concurrency = 1;

2020-11-29T11:26:12.181000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:12.181000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:12.271000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:12.587000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///media/.../whosonfirst-data-postalcode-latest.db {10065349E3}>
2020-11-29T11:26:12.587000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:12.643000Z DEBUG CONNECTED TO /media/.../whosonfirst-data-postalcode-latest.db
2020-11-29T11:26:12.644000Z SQL SQLite: SELECT tbl_name
  FROM sqlite_master
 WHERE tbl_name = 'sqlite_sequence'
2020-11-29T11:26:12.682000Z SQL SELECT tbl_name
  FROM sqlite_master
 WHERE type='table'
       AND tbl_name <> 'sqlite_sequence'


2020-11-29T11:26:12.756000Z INFO Processing source catalogs
2020-11-29T11:26:12.793000Z NOTICE Prepare PostgreSQL database.
2020-11-29T11:26:13.084000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006537D13}>
2020-11-29T11:26:13.084000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:13.189000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:13.570000Z DEBUG BEGIN
2020-11-29T11:26:14.137000Z SQL DROP TABLE IF EXISTS geojson CASCADE;
2020-11-29T11:26:14.418000Z SQL CREATE TABLE geojson
(
  id           bigint,
  body         text,
  source       text,
  alt_label    text,
  is_alt       boolean,
  lastmodified bigint
);
2020-11-29T11:26:15.167000Z SQL -- params: table-names
select n, n::regclass::oid
  from (values ('geojson')) as t(n);

2020-11-29T11:26:15.421000Z NOTICE COPY geojson
2020-11-29T11:26:15.445000Z INFO COPY ON ERROR STOP
2020-11-29T11:26:15.445000Z DEBUG Reader started for geojson
2020-11-29T11:26:15.502000Z DEBUG start geojson   4527
2020-11-29T11:26:15.503000Z SQL SQLite: SELECT `id`, `body`, `source`, `alt_label`, `is_alt`, `lastmodified` FROM `geojson`;
2020-11-29T11:26:15.524000Z DEBUG CONNECTED TO /media/.../whosonfirst-data-postalcode-latest.db
2020-11-29T11:26:15.647000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://[email protected]:5432/DBNAME {1006C7F183}>
2020-11-29T11:26:15.648000Z DEBUG SET client_encoding TO 'utf8'
2020-11-29T11:26:15.819000Z DEBUG SET application_name TO 'pgloader'
2020-11-29T11:26:15.946000Z INFO pgsql:copy-rows-from-queue[0]: geojson (id body source alt_label is_alt
                                        lastmodified)
Killed

In /var/log/messages I found

Nov 29 17:02:58 vagrant kernel: Out of memory: Kill process 9699 (pgloader) score 581 or sacrifice child
Nov 29 17:02:58 vagrant kernel: Killed process 9699 (pgloader), UID 0, total-vm:4842572kB, anon-rss:397596kB, file-rss:192kB, shmem-rss:0kB

I also tried with the following

  with batch size = 10MB, batch rows = 1000, create no indexes, concurrency = 1
  set maintenance_work_mem to '64MB', work_mem to '4MB';

and even batch rows = 1 and prefetch rows = 1, and no luck.

poshest avatar Nov 29 '20 12:11 poshest

@poshest did you ever find out what was killing your process?

ericadams avatar Apr 12 '22 20:04 ericadams

@ericadams no, I never ended uploading the geojson file. The data in the other WOF files together with other sources we used proved sufficient. I have since left the project.

poshest avatar Apr 13 '22 10:04 poshest