mediawiki-xml2sql icon indicating copy to clipboard operation
mediawiki-xml2sql copied to clipboard

Missing delimiters and NULL placeholders

Open wwentland opened this issue 15 years ago • 2 comments

xml2sql output is faulty for missing columns

If I run xml2sql i get the following output for revision:

$ head -n 6 revision.sql
-- xml2sql - MediaWiki XML to SQL converter
-- Table revision for PostgreSQL

COPY "revision" FROM STDIN;
287059  2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  2009-08-19T03:20:58Z    1   0
32885   10  32885   +jamii  160 Baba Tabita 2007-02-02T10:14:01Z    1   0

If i try to import this i get the following error:

psql -h localhost -d test2 -U babilen -W < revision.sql
Password for user babilen: 
ERROR:  missing data for column "rev_len"
CONTEXT:  COPY revision, line 1: "287059    2   287059  roboti Nyongeza: [[tpi:Akiolosi]]   988 LaaknorBot  20090819032058  1   0"

The issue can be fixed if the following awk oneliner is applied to the txt files (or lines within revision.sql that hold values)

awk -v e=10 'BEGIN { FS = OFS = "\t"; e++ } NF > e { exit(1) } { t = ""; for (a = 0; a < e - NF; a++) t = t "\t\\N"; printf "%s%s\n", $0, t }'

The difference in output is: 287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0 287059 2 287059 roboti Nyongeza: [[tpi:Akiolosi]] 988 LaaknorBot 2009-08-19T03:20:58Z 1 0 \N \N

The \N are placeholders for NULL values which is OK since the respective columns (rev_len and rev_parent_id) are defined as nullable:

CREATE TABLE revision (
....
   rev_len         INTEGER          NULL,
   rev_parent_id   INTEGER          NULL
);

thanks

babilen

wwentland avatar Aug 28 '09 16:08 wwentland

another possibility to solve this is to integrate the modification into xml2sql instead of postprocessing its output. The patch can be found here: http://meta.wikimedia.org/wiki/Xml2sql#patch_for_recent_versions_of_mw_.28.3E.3D1.10.29

antot avatar Feb 23 '10 11:02 antot

Correct. I have forked xml2sql and fixed the issues I reported here. You might want to take a look at:

http://github.com/babilen/mediawiki-xml2sql

wwentland avatar Feb 23 '10 11:02 wwentland