pgloader icon indicating copy to clipboard operation
pgloader copied to clipboard

Heap exhausted during load data into postgres

Open hdanet opened this issue 5 years ago • 10 comments

I load several file into pg database using pgloader. Some are successful imported, while one file failed. "Heap exhausted" error occurs. #962,#16,$487,#327 show some solution, and I tried --with "prefetch rows=1000".error remains.I can not compile it on my server in production. Would you please tell me how to tackle this ?Thanks.

hdanet avatar Jun 28 '19 08:06 hdanet

Is it possible for you share the problematic file with me, over personal email maybe? (see my profile page for the email address).

dimitri avatar Jul 02 '19 20:07 dimitri

I have came across the same error when doing a test run on the sample database chinook (SQLite version). I have read #327, here's my .load file:

load database
     from 'chinook.db'
     into postgresql:///chinook

with include drop, create tables, create indexes, reset sequences, batch rows = 10, batch concurrency = 1, workers = 1, prefetch rows = 2

set work_mem to '16 MB', maintenance_work_mem to '512 MB'; 

Here's the output

$ sudo -u postgres pgloader --debug migrate-sqlite.load
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-03-30T21:24:28.033000Z NOTICE Starting pgloader, log system is ready.
2020-03-30T21:24:28.097000Z INFO Starting monitor
2020-03-30T21:24:28.115000Z INFO Parsed command:
load database
     from 'chinook.db'
     into postgresql:///chinook

with include drop, create tables, create indexes, reset sequences

set work_mem to '16 MB', maintenance_work_mem to '512 MB';

2020-03-30T21:24:28.244000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.244000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:24:28.311000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:24:28.312000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:24:28.312000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:24:28.365000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///home/user/sqlite-db-sample/chinook.db {E6EB279}>
2020-03-30T21:24:28.365000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.427000Z DEBUG CONNECTED TO /home/user/sqlite-db-sample/chinook.db
2020-03-30T21:24:28.427000Z SQL SQLite: SELECT tbl_name
  FROM sqlite_master
 WHERE tbl_name = 'sqlite_sequence'
2020-03-30T21:24:28.449000Z SQL SELECT tbl_name
  FROM sqlite_master
 WHERE type='table'
       AND tbl_name <> 'sqlite_sequence'


2020-03-30T21:24:28.464000Z NOTICE SQLite column albums.AlbumId uses a sequence
2020-03-30T21:24:28.473000Z NOTICE SQLite column artists.ArtistId uses a sequence
2020-03-30T21:24:28.473000Z NOTICE SQLite column customers.CustomerId uses a sequence
2020-03-30T21:24:28.474000Z NOTICE SQLite column employees.EmployeeId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column genres.GenreId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column invoices.InvoiceId uses a sequence
2020-03-30T21:24:28.475000Z NOTICE SQLite column invoice_items.InvoiceLineId uses a sequence
2020-03-30T21:24:28.476000Z NOTICE SQLite column media_types.MediaTypeId uses a sequence
2020-03-30T21:24:28.476000Z NOTICE SQLite column playlists.PlaylistId uses a sequence
2020-03-30T21:24:28.477000Z NOTICE SQLite column tracks.TrackId uses a sequence
2020-03-30T21:24:28.551000Z NOTICE Prepare PostgreSQL database.
2020-03-30T21:24:28.557000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6EB861}>
2020-03-30T21:24:28.558000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:24:28.559000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:24:28.559000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:24:28.560000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:24:28.570000Z DEBUG BEGIN
KABOOM!
Killed
user (master #) sqlite-db-sample
$ !pkill
pkill Typora
user (master #) sqlite-db-sample
$ sudo -u postgres pgloader --debug migrate-sqlite.load
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2020-03-30T21:28:21.027000Z NOTICE Starting pgloader, log system is ready.
2020-03-30T21:28:21.068000Z INFO Starting monitor
2020-03-30T21:28:21.123000Z INFO Parsed command:
load database
     from 'chinook.db'
     into postgresql:///chinook

with include drop, create tables, create indexes, reset sequences

set work_mem to '16 MB', maintenance_work_mem to '512 MB';

2020-03-30T21:28:21.172000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.172000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:28:21.208000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:28:21.209000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:28:21.209000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:28:21.267000Z LOG Migrating from #<SQLITE-CONNECTION sqlite:///home/user/sqlite-db-sample/chinook.db {E6D5279}>
2020-03-30T21:28:21.267000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.315000Z DEBUG CONNECTED TO /home/user/sqlite-db-sample/chinook.db
2020-03-30T21:28:21.315000Z SQL SQLite: SELECT tbl_name
  FROM sqlite_master
 WHERE tbl_name = 'sqlite_sequence'
2020-03-30T21:28:21.331000Z SQL SELECT tbl_name
  FROM sqlite_master
 WHERE type='table'
       AND tbl_name <> 'sqlite_sequence'


2020-03-30T21:28:21.339000Z NOTICE SQLite column albums.AlbumId uses a sequence
2020-03-30T21:28:21.344000Z NOTICE SQLite column artists.ArtistId uses a sequence
2020-03-30T21:28:21.345000Z NOTICE SQLite column customers.CustomerId uses a sequence
2020-03-30T21:28:21.346000Z NOTICE SQLite column employees.EmployeeId uses a sequence
2020-03-30T21:28:21.346000Z NOTICE SQLite column genres.GenreId uses a sequence
2020-03-30T21:28:21.347000Z NOTICE SQLite column invoices.InvoiceId uses a sequence
2020-03-30T21:28:21.347000Z NOTICE SQLite column invoice_items.InvoiceLineId uses a sequence
2020-03-30T21:28:21.348000Z NOTICE SQLite column media_types.MediaTypeId uses a sequence
2020-03-30T21:28:21.348000Z NOTICE SQLite column playlists.PlaylistId uses a sequence
2020-03-30T21:28:21.349000Z NOTICE SQLite column tracks.TrackId uses a sequence
2020-03-30T21:28:21.417000Z NOTICE Prepare PostgreSQL database.
2020-03-30T21:28:21.423000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@UNIX:5432/chinook {E6D5861}>
2020-03-30T21:28:21.423000Z DEBUG SET client_encoding TO 'utf8'
2020-03-30T21:28:21.424000Z DEBUG SET work_mem TO '16 MB'
2020-03-30T21:28:21.425000Z DEBUG SET maintenance_work_mem TO '512 MB'
2020-03-30T21:28:21.425000Z DEBUG SET application_name TO 'pgloader'
2020-03-30T21:28:21.431000Z DEBUG BEGIN
KABOOM!
Heap exhausted during allocation: 336834560 bytes available, 631554264 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB   LUB  !move  Alloc  Waste   Trig    WP  GCs Mem-age
   0: 21203     0 21138     0   264     0   139     0     0  1624232 26456 10737418    0   0  0.0000
   1:     0     0     0     0   909    22     0 154310 154981 635735808 131328 646473226  200   1  0.0000
   2:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   3:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   4:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   5:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   6:     0     0     0     0 13021  6979     0     0     0 81920000     0  2000000 12743   0  0.0000
   Total bytes allocated    = 719280040
   Dynamic-space-size bytes = 1073741824
GC control variables:
   *GC-INHIBIT* = false
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false
2020-03-30T21:28:25.232000Z INFO Stopping monitor

What I am doing here?

A SB-KERNEL::HEAP-EXHAUSTED-ERROR condition without bindings for heap statistics.  (If you did not expect to see this message, please report it.

What I find interesting is this line

Total bytes allocated = 719280040

The machine I am currently working has a RAM of 2GB:

OS: Debian 9 on linux 4.9.0-12-686-pae
pgloader version: "3.5.2" compiled with SBCL 1.3.14.debian
postgres version: PostgreSQL 12.2 (Debian 12.2-2.pgdg90+1) on i686-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 32-bit

Do you think I should proceed with compiling from source with CCL ?

rfc2119 avatar Mar 30 '20 22:03 rfc2119

I see you're using pgloader 3.5.2, if you're up for building from sources I would advice building the current master's branch with the SBCL version you have in Debian first, and see what happens. Then yeah building with CCL is also an option.

dimitri avatar Apr 03 '20 22:04 dimitri

I have the same problem importing a MySQL database but it is a bit random. I tweaked the "batch rows", "batch size" and "rows per range" but I can't find a parameter which always works, so I retry when it fails and at some point it works.

The versions used are :

$~/pgloader$ ./build/bin/pgloader --version
pgloader version "3.6.2"
compiled with SBCL 1.3.14.debian

Extract from my load script:

load database
  from mysql://myhost/mydb
  into postgresql://localhost/mydb
WITH batch rows = 10000,
     batch size = 500MB,
     max parallel create index = 1,
preserve index names,
create no indexes,
concurrency = 15,
multiple reader per thread,
rows per range = 10000

joulaud avatar Jun 23 '20 06:06 joulaud

As @dimitri said it in #327, CCL version is more efficiency. I managed to migrate Zabbix DB with millions of rows with this docker-compose (v2, because of Portainer):

version: 2
services:
  pgloader:
    container_name: pgloader
    image: dimitri/pgloader:ccl.latest
    command: 'pgloader --verbose --debug --with "batch rows = 100000" mysql://root:[email protected]:3306/dbname postgresql://root:[email protected]:5432/dbname'
    networks:
     user-bridge:
       ipv4_address: "10.10.0.10"
    logging:
      driver: json-file
networks:
  user-bridge:
    external:
      name: user-bridge

undefinedid avatar Nov 14 '20 21:11 undefinedid

I was getting this error while importing several fixed width txt files (200 of them) resulting in 70+GB of data. This got solved by adding: WITH truncate, drop indexes, batch size = 4GB, prefetch rows = 50000 and SET work_mem to '5GB', standard_conforming_strings to 'on'

and without the need of running it from a source built nor a CCL version. Just the simple pgloader as CLI with a load file.

BTW @dimitri, impressive piece of problem solver. I managed to bypass other software usage and, therefore, overcomplicating my workflow with this. If you visit Berlin at some point, drop me PM... I owe you dinner ;D.

fubespu avatar Apr 28 '21 21:04 fubespu

Seeing this in a small set, table with two rows, table size of 74mb. The important bit (I believe) is that one of the columns is a longblob. This blob contains binary data.

Context This is the docker pgloader:latest, running in a WSL2 environment, the container shouldn't be memory constrained.

pgloader version "3.6.3~devel"
compiled with SBCL 1.4.16.debian

Source DB is MySQL 5.6

Heap exhausted during allocation: 24936448 bytes available, 59475232 requested.
Gen  Boxed Unboxed   LgBox LgUnbox  Pin       Alloc     Waste        Trig      WP GCs Mem-age
 0       2       0       0       0    0           0     65536    10737418       2   0  0.0000
 1     194     928       0    6417 5933   246338544    699408    10737418    7539   0  0.6287
 2     407    3963       0    4995  533   306329808    542512    10737418    9365   0  0.5444
 3     909    4720    4825       0   81   337783360   4773312     2000000   10454   0  0.0000
 4       0       0       0       0    0           0         0     2000000       0   0  0.0000
 5       0       0       0       0    0           0         0     2000000       0   0  0.0000
 6    1409    1115     208     171    0    91891312   3234192     2000000    2903   0  0.0000
 7       0       0       0       0    0           0         0     2000000       0   0  0.0000
           Total bytes allocated    =     982343024
           Dynamic-space-size bytes =    1073741824
GC control variables:
   *GC-INHIBIT* = false
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false
2021-09-29T18:46:05.602000Z ERROR Error while formating a row from "<schema>.<table>":
2021-09-29T18:46:05.602000Z ERROR A SB-KERNEL::HEAP-EXHAUSTED-ERROR condition without bindings for heap statistics.  (If
you did not expect to see this message, please report it.
2021-09-29T18:46:26.008000Z ERROR The value
  NIL
is not of type
  NUMBER
when binding SB-KERNEL::X
2021-09-29T18:46:26.137000Z WARNING PostgreSQL warning: there is no transaction in progress

My WITH is currently this, but I will try some adjustments.

WITH
    batch size = 1GB,
    disable triggers

smarkwell avatar Sep 29 '21 18:09 smarkwell

Can confirm that switching to pgloader:ccl.latest and dropping my WITH clauses overcame the heap issue.

pgloader version "3.6.3~devel"
compiled with Clozure Common Lisp Version 1.11.5/v1.11.5  (LinuxX8664)

smarkwell avatar Sep 29 '21 19:09 smarkwell

As @dimitri said it in #327, CCL version is more efficiency. I managed to migrate Zabbix DB with millions of rows with this docker-compose (v2, because of Portainer):

version: 2
services:
  pgloader:
    container_name: pgloader
    image: dimitri/pgloader:ccl.latest
    command: 'pgloader --verbose --debug --with "batch rows = 100000" mysql://root:[email protected]:3306/dbname postgresql://root:[email protected]:5432/dbname'
    networks:
     user-bridge:
       ipv4_address: "10.10.0.10"
    logging:
      driver: json-file
networks:
  user-bridge:
    external:
      name: user-bridge

can't agree with this more. This helped transferred a huge MySQL database when other compiled version would run out of memory. I would suggest everyone to try CCL compiled if others failed. Definitely worked!

fahadshery avatar Oct 26 '21 21:10 fahadshery

Also confirming that switching to pgloader:ccl.latest fixed my heap issues. Any chance that could become the new mainline release?

AlexVulaj avatar Nov 15 '21 16:11 AlexVulaj