pg_bulkload icon indicating copy to clipboard operation
pg_bulkload copied to clipboard

Error in `tuplesort_performsort`

Open ashiklom opened this issue 8 years ago • 7 comments

I'm using pg_bulkload to import a bunch of large (3-5 million row x 4 column) CSV files into a PostgreSQL database. The first two import very quickly and without issue, but subsequent ones fail with the following error:

ERROR: copy failed: ERROR:  faild in tuplesort_performsort

For reference, my control file looks like this:

OUTPUT = spectra_data
TYPE = CSV
DELIMITER = ","

...and the command I'm running looks like this.

pg_bulkload -d leaf_spectra control.ctl -i stdin < data.csv

(Actually, the command and files are generated automatically by an R system call...but that's effectively what it is).

The table I'm trying to import into looks like this:

                                  Table "public.spectra_data"                                                                                                         [0/1862]
    Column     |  Type   |                              Modifiers
---------------+---------+----------------------------------------------------------------------
 spectradataid | bigint  | not null default nextval('spectra_data_spectradataid_seq'::regclass)
 spectraid     | bigint  | not null
 wavelength    | numeric |
 spectravalue  | numeric |
Indexes:
    "spectra_data_pkey" PRIMARY KEY, btree (spectradataid)
Check constraints:
    "legal_wavelength" CHECK (wavelength > 0::numeric)
Foreign-key constraints:
    "spectra_data_spectraid_fkey" FOREIGN KEY (spectraid) REFERENCES spectra_info(spectraid) ON DELETE CASCADE

Any idea why?

FYI, I'm running Postgres version 9.6.1 (latest) on Arch Linux, with an install of pg_bulkload from the master branch earlier today (Feb 7, 2017).

ashiklom avatar Feb 07 '17 21:02 ashiklom

Hmm, pg_bulkload came across the error in index rebuilding in the end of data loading stage.

Could you give us data files which can reproduce this?

bwtakacy avatar Feb 08 '17 02:02 bwtakacy

Hello Ppl , im also facing the same issue?..is there any other closed post for this issue?

arshad88 avatar Mar 21 '17 10:03 arshad88

Hey, any update on this issue? I am facing it as well. The first time I ran bulkload the import was successful. Subsequently, I deleted all the records inserted in the table and retried the same load. The second time I go this error.

silvianacmp avatar Apr 19 '17 07:04 silvianacmp

I've managed to get rid of the error in the meantime. In my case, the problem was that I deleted the records simply using DELETE FROM, which does not physically remove them. After running a VACUUM FULL command on the table it worked.

silvianacmp avatar Apr 20 '17 09:04 silvianacmp

following procedure can reproduce this error.

  1. create random 30 charactor * 200000 lines data
import sys
import random

source_str = 'abcdefhijklmnopqrstuvwxyz0123456789'
random.seed(1)
num = int(sys.argv[1])
for i in range(num):
  print("".join([random.choice(source_str) for x in xrange(30)]))
  1. set maintenance_work_mem to 5MB
  2. create table
psql testdb -c "CREATE TABLE test(id TEXT PRIMARY KEY);"

4.execute pg_bulkload

cat test.csv | /usr/pgsql-9.6/bin/pg_bulkload -d testdb test.ctl

control file

INPUT = stdin
OUTPUT = public.test
TYPE = CSV
QUOTE = "\""
ESCAPE = \
DELIMITER = ","
NULL = NULL
DUPLICATE_ERRORS = INFINITE
ON_DUPLICATE_KEEP = NEW

pg_bulkload finished successfully when maintenance_work_mem is 20MB.

KeizoUehara avatar Nov 19 '17 09:11 KeizoUehara

CentOS Linux release 7.4 pg_bulkload 3.1.14

this error occurred with postgresql-9.6.6 and postgresql-10.1

No error occurred with postgresql-9.5.10

KeizoUehara avatar Nov 20 '17 07:11 KeizoUehara

Solved it by increasing the "maintenance_work_mem" paramter to 128MB in the "postgresql.conf" configuration file. Server restart required. Verify the adjustment was made by running the query: "show maintenance_work_mem"

egalot avatar May 17 '18 04:05 egalot