pg_bulkload
pg_bulkload copied to clipboard
Error in `tuplesort_performsort`
I'm using pg_bulkload to import a bunch of large (3-5 million row x 4 column) CSV files into a PostgreSQL database. The first two import very quickly and without issue, but subsequent ones fail with the following error:
ERROR: copy failed: ERROR: faild in tuplesort_performsort
For reference, my control file looks like this:
OUTPUT = spectra_data
TYPE = CSV
DELIMITER = ","
...and the command I'm running looks like this.
pg_bulkload -d leaf_spectra control.ctl -i stdin < data.csv
(Actually, the command and files are generated automatically by an R system call...but that's effectively what it is).
The table I'm trying to import into looks like this:
Table "public.spectra_data" [0/1862]
Column | Type | Modifiers
---------------+---------+----------------------------------------------------------------------
spectradataid | bigint | not null default nextval('spectra_data_spectradataid_seq'::regclass)
spectraid | bigint | not null
wavelength | numeric |
spectravalue | numeric |
Indexes:
"spectra_data_pkey" PRIMARY KEY, btree (spectradataid)
Check constraints:
"legal_wavelength" CHECK (wavelength > 0::numeric)
Foreign-key constraints:
"spectra_data_spectraid_fkey" FOREIGN KEY (spectraid) REFERENCES spectra_info(spectraid) ON DELETE CASCADE
Any idea why?
FYI, I'm running Postgres version 9.6.1 (latest) on Arch Linux, with an install of pg_bulkload from the master branch earlier today (Feb 7, 2017).
Hmm, pg_bulkload came across the error in index rebuilding in the end of data loading stage.
Could you give us data files which can reproduce this?
Hello Ppl , im also facing the same issue?..is there any other closed post for this issue?
Hey, any update on this issue? I am facing it as well. The first time I ran bulkload the import was successful. Subsequently, I deleted all the records inserted in the table and retried the same load. The second time I go this error.
I've managed to get rid of the error in the meantime. In my case, the problem was that I deleted the records simply using DELETE FROM, which does not physically remove them. After running a VACUUM FULL command on the table it worked.
following procedure can reproduce this error.
- create random 30 charactor * 200000 lines data
import sys
import random
source_str = 'abcdefhijklmnopqrstuvwxyz0123456789'
random.seed(1)
num = int(sys.argv[1])
for i in range(num):
print("".join([random.choice(source_str) for x in xrange(30)]))
- set maintenance_work_mem to 5MB
- create table
psql testdb -c "CREATE TABLE test(id TEXT PRIMARY KEY);"
4.execute pg_bulkload
cat test.csv | /usr/pgsql-9.6/bin/pg_bulkload -d testdb test.ctl
control file
INPUT = stdin
OUTPUT = public.test
TYPE = CSV
QUOTE = "\""
ESCAPE = \
DELIMITER = ","
NULL = NULL
DUPLICATE_ERRORS = INFINITE
ON_DUPLICATE_KEEP = NEW
pg_bulkload finished successfully when maintenance_work_mem is 20MB.
CentOS Linux release 7.4 pg_bulkload 3.1.14
this error occurred with postgresql-9.6.6 and postgresql-10.1
No error occurred with postgresql-9.5.10
Solved it by increasing the "maintenance_work_mem" paramter to 128MB in the "postgresql.conf" configuration file. Server restart required. Verify the adjustment was made by running the query: "show maintenance_work_mem"