Pgloader is erroring out for PG URI with IPV6 Host
-
[x] pgloader --version
pgloader version "3.6.1" compiled with SBCL 1.4.0-1.el7 -
[x] did you test a fresh compile from the source tree? Yes
-
[x] did you search for other similar issues? Yes. Bug #837 Seems to be related to the parsing of IpV6 address.
-
[x] how can I reproduce the bug? pgloader b.load
LOAD CSV
FROM '/u02/netcracker/rbm/rbmvm/infinys_root/a.csv' WITH ENCODING UTF8 (COLUMN1,COLUMN2)
INTO postgresql://netcrk:crknet@[fd95:ff55:7fb8:f1e5:f816:3eff:fe5d:96b7]:5432/postgres
TARGET TABLE TESTACCOUNT(COLUMN1,COLUMN2)
WITH TRUNCATE,
SKIP HEADER = 0,
FIELDS OPTIONALLY ENCLOSED BY '"',
FIELDS TERMINATED BY ','
SET CLIENT_ENCODING TO 'UTF8,
work_mem to '128MB',
standard_conforming_strings to 'on'
BEFORE LOAD DO $$ CREATE SCHEMA IF NOT EXISTS PUBLIC; $$,
$$ CREATE TABLE IF NOT EXISTS PUBLIC.TESTACCOUNT(COLUMN1 INTEGER,COLUMN2 INTEGER);
$$;
- [x] pgloader output you obtain
2019-07-26T17:28:53.045000+01:00 LOG pgloader version "3.6.1"
KABOOM!
FATAL error: At
LOAD CSV FROM '/u02/netcracker/rbm/rbmvm/infinys_root/a.csv' WITH ENCODING UTF8 (COLUMN1,COLUMN2) INTO postgresql://netcrk:crknet@[fd95:ff55:7fb8:f1e5:f81
^ (Line 1, Column 131, Position 131)
In context PGSQL-URI:
While parsing DSN-DBNAME. Expected:
the character / (SOLIDUS)
While parsing DSN-PORT. Expected:
the character : (COLON)
While parsing NAMESTRING. Problem:
The production
#\[
does not satisfy the predicate ALPHA-CHAR-P.
Expected:
the character _ (LOW_LINE)
or any character satisfying ALPHA-CHAR-P
While parsing SOCKET-DIRECTORY. Expected:
the string "unix:"
While parsing IPV4-PART. Problem:
The production
#\[
does not satisfy the predicate DIGIT-CHAR-P.
Expected:
any character satisfying DIGIT-CHAR-P
An unhandled error condition has been signalled: At
LOAD CSV FROM '/u02/netcracker/rbm/rbmvm/infinys_root/a.csv' WITH ENCODING UTF8 (COLUMN1,COLUMN2) INTO postgresql://netcrk:crknet@[fd95:ff55:7fb8:f1e5:f81
^ (Line 1, Column 131, Position 131)
In context PGSQL-URI:
While parsing DSN-DBNAME. Expected:
the character / (SOLIDUS)
While parsing DSN-PORT. Expected:
the character : (COLON)
While parsing NAMESTRING. Problem:
The production
#\[
does not satisfy the predicate ALPHA-CHAR-P.
Expected:
the character _ (LOW_LINE)
or any character satisfying ALPHA-CHAR-P
While parsing SOCKET-DIRECTORY. Expected:
the string "unix:"
While parsing IPV4-PART. Problem:
The production
#\[
does not satisfy the predicate DIGIT-CHAR-P.
Expected:
any character satisfying DIGIT-CHAR-P
What I am doing here?
At
LOAD CSV FROM '/u02/netcracker/rbm/rbmvm/infinys_root/a.csv' WITH ENCODING UTF8 (COLUMN1,COLUMN2) INTO postgresql://netcrk:crknet@[fd95:ff55:7fb8:f1e5:f81
^ (Line 1, Column 131, Position 131)
In context PGSQL-URI:
While parsing DSN-DBNAME. Expected:
the character / (SOLIDUS)
While parsing DSN-PORT. Expected:
the character : (COLON)
While parsing NAMESTRING. Problem:
The production
#\[
does not satisfy the predicate ALPHA-CHAR-P.
Expected:
the character _ (LOW_LINE)
or any character satisfying ALPHA-CHAR-P
While parsing SOCKET-DIRECTORY. Expected:
the string "unix:"
While parsing IPV4-PART. Problem:
The production
#\[
does not satisfy the predicate DIGIT-CHAR-P.
Expected:
any character satisfying DIGIT-CHAR-P
- [x] data that is being loaded, if relevant
1,1
1,2
1,3
- [x] How the data is different from what you expected, if relevant Pgloader should be parsing the URI.
I can confirm this issue using version:
pgloader version "3.6.9"
compiled with SBCL 2.4.4
The issue might be that we currently don't match hexadecimal digits (which include the characters 0-9 and A-F/a-f).
pgloader/src/parsers/parse-pgpass.lisp I think we should use:
(digit-char-p character 16)
This tells digit-char-p to check for characters valid in base 16 (hexadecimal), instead of just the default base 10 (which only includes 0-9). And this then would allow for all valid characters in an IPv6 address.
I'm not sure how to contribute a pull request or how the workflow regarding permissions should be.
But it would be awesome if you could confirm my assumption, @dimitri, and either introduce the change or point out how I can help further more with this issue.
It's correct that IPv6 host parsing is broken there. Thank you for finding that.
The fix is a bit different though, as what you're seeing there is not function application but a parser rule. Specifically, the symbol character is a terminal that always succeeds and consumes and produces a single character. The surrounding predicate form is then a rule that specifies that the given predicate (of arity 1) applied to the successful result of the inner expression must return true, for the outer to succeed.
In other words: we need to define a predicate of arity 1 to use here, or reformulate the rule.
There are two places where this bug appears (the one you found is not the one affecting your load file).