mediawiki-xml2sql
mediawiki-xml2sql copied to clipboard
unknown tags when importing, and missing the CREATE TABLE statement
I am trying running the example of the README.ME, but this file does not exist: http://download.wikimedia.org/enwiki/pages-meta-current.xml.bz2
instead, I downloaded this one: https://dumps.wikimedia.org/enwiki/20160305/enwiki-20160305-pages-meta-current.xml.bz2
, but xml2sql
fails to import it. Is this file a valid input for your program?
$ bunzip2 -c enwiki-20160305-pages-meta-current.xml.bz2 | xml2sql -m
unexpected element <dbname>
xml2sql: parsing aborted at line 4 pos 12.
It works (it creates page.sql
, revision.sql
and text.sql
) if I remove some the tags as follows:
$ cat enwiki-20160305-pages-meta-current1.xml-p000000010p000030303 | egrep -v "<dbname>|<ns>|<redirect|<parentid>|<model>|<format>|<sha1>" | xml2sql -m
does this mean that the wikipedia format has evolved and mediawiki-xml2sql needs to be updated?
or is there an alternative tool to achieve the same thing?
also, the three generated sql files have INSERT INTO
statements, but the CREATE TABLE
statement is missing. Can you please tell me the required CREATE TABLE
statement?
This page: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing explains that this project is dead and links to other alternatives. You should explain this in your README.ME file :(
On the top of the github project page it says "Dead project -- feel free to fork and update!", so it seems there is no intention to hide this. maybe send a PR for the README?
Does anyone have any leads on how to change the code to allow it to work ?
there are other tools here: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing