pgloader
pgloader copied to clipboard
Heap exhausted during load data into postgres
I load several file into pg database using pgloader. Some are successful imported, while one file failed. "Heap exhausted" error occurs. #962,#16,$487,#327 show some solution, and I tried --with "prefetch rows=1000".error remains.I can not compile it on my server in production. Would you please tell me how to tackle this ?Thanks.
Is it possible for you share the problematic file with me, over personal email maybe? (see my profile page for the email address).
I have came across the same error when doing a test run on the sample database chinook (SQLite version). I have read #327, here's my .load file:
load database
from 'chinook.db'
into postgresql:///chinook
with include drop, create tables, create indexes, reset sequences, batch rows = 10, batch concurrency = 1, workers = 1, prefetch rows = 2
set work_mem to '16 MB', maintenance_work_mem to '512 MB';
Here's the output
$ sudo -u postgres pgloader --debug migrate-sqlite.load
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-03-30T21:24:28.033000Z NOTICE Starting pgloader, log system is ready.
2020-03-30T21:24:28.097000Z INFO Starting monitor
2020-03-30T21:24:28.115000Z INFO Parsed command:
load database
from 'chinook.db'
into postgresql:///chinook
with include drop, create tables, create indexes, reset sequences
set work_mem to '16 MB', maintenance_work_mem to '512 MB';
2020-03-30T21:24:28.244000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.244000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:24:28.311000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:24:28.312000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:24:28.312000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:24:28.365000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///home/user/sqlite-db-sample/chinook.db {E6EB279}>
2020-03-30T21:24:28.365000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.427000Z DEBUG CONNECTED TO /home/user/sqlite-db-sample/chinook.db
2020-03-30T21:24:28.427000Z SQL SQLite: SELECT tbl_name
FROM sqlite_master
WHERE tbl_name = 'sqlite_sequence'
2020-03-30T21:24:28.449000Z SQL SELECT tbl_name
FROM sqlite_master
WHERE type='table'
AND tbl_name <> 'sqlite_sequence'
2020-03-30T21:24:28.464000Z NOTICE SQLite column albums.AlbumId uses a sequence
2020-03-30T21:24:28.473000Z NOTICE SQLite column artists.ArtistId uses a sequence
2020-03-30T21:24:28.473000Z NOTICE SQLite column customers.CustomerId uses a sequence
2020-03-30T21:24:28.474000Z NOTICE SQLite column employees.EmployeeId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column genres.GenreId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column invoices.InvoiceId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column invoice_items.InvoiceLineId uses a sequence
2020-03-30T21:24:28.476000Z NOTICE SQLite column media_types.MediaTypeId uses a sequence
2020-03-30T21:24:28.476000Z NOTICE SQLite column playlists.PlaylistId uses a sequence
2020-03-30T21:24:28.477000Z NOTICE SQLite column tracks.TrackId uses a sequence
2020-03-30T21:24:28.551000Z NOTICE Prepare PostgreSQL database.
2020-03-30T21:24:28.557000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.558000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:24:28.559000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:24:28.559000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:24:28.560000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:24:28.570000Z DEBUG BEGIN
KABOOM!
Killed
user (master #) sqlite-db-sample
$ !pkill
pkill Typora
user (master #) sqlite-db-sample
$ sudo -u postgres pgloader --debug migrate-sqlite.load
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-03-30T21:28:21.027000Z NOTICE Starting pgloader, log system is ready.
2020-03-30T21:28:21.068000Z INFO Starting monitor
2020-03-30T21:28:21.123000Z INFO Parsed command:
load database
from 'chinook.db'
into postgresql:///chinook
with include drop, create tables, create indexes, reset sequences
set work_mem to '16 MB', maintenance_work_mem to '512 MB';
2020-03-30T21:28:21.172000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.172000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:28:21.208000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:28:21.209000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:28:21.209000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:28:21.267000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///home/user/sqlite-db-sample/chinook.db {E6D5279}>
2020-03-30T21:28:21.267000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.315000Z DEBUG CONNECTED TO /home/user/sqlite-db-sample/chinook.db
2020-03-30T21:28:21.315000Z SQL SQLite: SELECT tbl_name
FROM sqlite_master
WHERE tbl_name = 'sqlite_sequence'
2020-03-30T21:28:21.331000Z SQL SELECT tbl_name
FROM sqlite_master
WHERE type='table'
AND tbl_name <> 'sqlite_sequence'
2020-03-30T21:28:21.339000Z NOTICE SQLite column albums.AlbumId uses a sequence
2020-03-30T21:28:21.344000Z NOTICE SQLite column artists.ArtistId uses a sequence
2020-03-30T21:28:21.345000Z NOTICE SQLite column customers.CustomerId uses a sequence
2020-03-30T21:28:21.346000Z NOTICE SQLite column employees.EmployeeId uses a sequence
2020-03-30T21:28:21.346000Z NOTICE SQLite column genres.GenreId uses a sequence
2020-03-30T21:28:21.347000Z NOTICE SQLite column invoices.InvoiceId uses a sequence
2020-03-30T21:28:21.347000Z NOTICE SQLite column invoice_items.InvoiceLineId uses a sequence
2020-03-30T21:28:21.348000Z NOTICE SQLite column media_types.MediaTypeId uses a sequence
2020-03-30T21:28:21.348000Z NOTICE SQLite column playlists.PlaylistId uses a sequence
2020-03-30T21:28:21.349000Z NOTICE SQLite column tracks.TrackId uses a sequence
2020-03-30T21:28:21.417000Z NOTICE Prepare PostgreSQL database.
2020-03-30T21:28:21.423000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.423000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:28:21.424000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:28:21.425000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:28:21.425000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:28:21.431000Z DEBUG BEGIN
KABOOM!
Heap exhausted during allocation: 336834560 bytes available, 631554264 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 21203 0 21138 0 264 0 139 0 0 1624232 26456 10737418 0 0 0.0000
1: 0 0 0 0 909 22 0 154310 154981 635735808 131328 646473226 200 1 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
4: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
6: 0 0 0 0 13021 6979 0 0 0 81920000 0 2000000 12743 0 0.0000
Total bytes allocated = 719280040
Dynamic-space-size bytes = 1073741824
GC control variables:
*GC-INHIBIT* = false
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
2020-03-30T21:28:25.232000Z INFO Stopping monitor
What I am doing here?
A SB-KERNEL::HEAP-EXHAUSTED-ERROR condition without bindings for heap statistics. (If you did not expect to see this message, please report it.
What I find interesting is this line
Total bytes allocated = 719280040
The machine I am currently working has a RAM of 2GB:
OS: Debian 9 on linux 4.9.0-12-686-pae
pgloader version: "3.5.2" compiled with SBCL 1.3.14.debian
postgres version: PostgreSQL 12.2 (Debian 12.2-2.pgdg90+1) on i686-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 32-bit
Do you think I should proceed with compiling from source with CCL ?
I see you're using pgloader 3.5.2, if you're up for building from sources I would advice building the current master's branch with the SBCL version you have in Debian first, and see what happens. Then yeah building with CCL is also an option.
I have the same problem importing a MySQL database but it is a bit random. I tweaked the "batch rows", "batch size" and "rows per range" but I can't find a parameter which always works, so I retry when it fails and at some point it works.
The versions used are :
$~/pgloader$ ./build/bin/pgloader --version
pgloader version "3.6.2"
compiled with SBCL 1.3.14.debian
Extract from my load script:
load database
from mysql://myhost/mydb
into postgresql://localhost/mydb
WITH batch rows = 10000,
batch size = 500MB,
max parallel create index = 1,
preserve index names,
create no indexes,
concurrency = 15,
multiple reader per thread,
rows per range = 10000
As @dimitri said it in #327, CCL version is more efficiency. I managed to migrate Zabbix DB with millions of rows with this docker-compose (v2, because of Portainer):
version: 2
services:
pgloader:
container_name: pgloader
image: dimitri/pgloader:ccl.latest
command: 'pgloader --verbose --debug --with "batch rows = 100000" mysql://root:[email protected]:3306/dbname postgresql://root:[email protected]:5432/dbname'
networks:
user-bridge:
ipv4_address: "10.10.0.10"
logging:
driver: json-file
networks:
user-bridge:
external:
name: user-bridge
I was getting this error while importing several fixed width txt files (200 of them) resulting in 70+GB of data.
This got solved by adding:
WITH truncate, drop indexes, batch size = 4GB, prefetch rows = 50000
and
SET work_mem to '5GB', standard_conforming_strings to 'on'
and without the need of running it from a source built nor a CCL version. Just the simple pgloader as CLI with a load file.
BTW @dimitri, impressive piece of problem solver. I managed to bypass other software usage and, therefore, overcomplicating my workflow with this. If you visit Berlin at some point, drop me PM... I owe you dinner ;D.
Seeing this in a small set, table with two rows, table size of 74mb. The important bit (I believe) is that one of the columns is a longblob. This blob contains binary data.
Context
This is the docker pgloader:latest, running in a WSL2 environment, the container shouldn't be memory constrained.
pgloader version "3.6.3~devel"
compiled with SBCL 1.4.16.debian
Source DB is MySQL 5.6
Heap exhausted during allocation: 24936448 bytes available, 59475232 requested.
Gen Boxed Unboxed LgBox LgUnbox Pin Alloc Waste Trig WP GCs Mem-age
0 2 0 0 0 0 0 65536 10737418 2 0 0.0000
1 194 928 0 6417 5933 246338544 699408 10737418 7539 0 0.6287
2 407 3963 0 4995 533 306329808 542512 10737418 9365 0 0.5444
3 909 4720 4825 0 81 337783360 4773312 2000000 10454 0 0.0000
4 0 0 0 0 0 0 0 2000000 0 0 0.0000
5 0 0 0 0 0 0 0 2000000 0 0 0.0000
6 1409 1115 208 171 0 91891312 3234192 2000000 2903 0 0.0000
7 0 0 0 0 0 0 0 2000000 0 0 0.0000
Total bytes allocated = 982343024
Dynamic-space-size bytes = 1073741824
GC control variables:
*GC-INHIBIT* = false
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
2021-09-29T18:46:05.602000Z ERROR Error while formating a row from "<schema>.<table>":
2021-09-29T18:46:05.602000Z ERROR A SB-KERNEL::HEAP-EXHAUSTED-ERROR condition without bindings for heap statistics. (If
you did not expect to see this message, please report it.
2021-09-29T18:46:26.008000Z ERROR The value
NIL
is not of type
NUMBER
when binding SB-KERNEL::X
2021-09-29T18:46:26.137000Z WARNING PostgreSQL warning: there is no transaction in progress
My WITH is currently this, but I will try some adjustments.
WITH
batch size = 1GB,
disable triggers
Can confirm that switching to pgloader:ccl.latest and dropping my WITH clauses overcame the heap issue.
pgloader version "3.6.3~devel"
compiled with Clozure Common Lisp Version 1.11.5/v1.11.5 (LinuxX8664)
As @dimitri said it in #327, CCL version is more efficiency. I managed to migrate Zabbix DB with millions of rows with this docker-compose (v2, because of Portainer):
version: 2 services: pgloader: container_name: pgloader image: dimitri/pgloader:ccl.latest command: 'pgloader --verbose --debug --with "batch rows = 100000" mysql://root:[email protected]:3306/dbname postgresql://root:[email protected]:5432/dbname' networks: user-bridge: ipv4_address: "10.10.0.10" logging: driver: json-file networks: user-bridge: external: name: user-bridge
can't agree with this more. This helped transferred a huge MySQL database when other compiled version would run out of memory. I would suggest everyone to try CCL compiled if others failed. Definitely worked!
Also confirming that switching to pgloader:ccl.latest fixed my heap issues. Any chance that could become the new mainline release?