mediawiki-xml2sql icon indicating copy to clipboard operation
mediawiki-xml2sql copied to clipboard

unknown tags when importing, and missing the CREATE TABLE statement

Open dportabella opened this issue 8 years ago • 4 comments

I am trying running the example of the README.ME, but this file does not exist: http://download.wikimedia.org/enwiki/pages-meta-current.xml.bz2

instead, I downloaded this one: https://dumps.wikimedia.org/enwiki/20160305/enwiki-20160305-pages-meta-current.xml.bz2, but xml2sql fails to import it. Is this file a valid input for your program?

$ bunzip2 -c enwiki-20160305-pages-meta-current.xml.bz2 | xml2sql -m
unexpected element <dbname>
xml2sql: parsing aborted at line 4 pos 12.

It works (it creates page.sql, revision.sql and text.sql) if I remove some the tags as follows: $ cat enwiki-20160305-pages-meta-current1.xml-p000000010p000030303 | egrep -v "<dbname>|<ns>|<redirect|<parentid>|<model>|<format>|<sha1>" | xml2sql -m does this mean that the wikipedia format has evolved and mediawiki-xml2sql needs to be updated? or is there an alternative tool to achieve the same thing?

also, the three generated sql files have INSERT INTO statements, but the CREATE TABLE statement is missing. Can you please tell me the required CREATE TABLE statement?

dportabella avatar Apr 11 '16 17:04 dportabella

This page: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing explains that this project is dead and links to other alternatives. You should explain this in your README.ME file :(

dportabella avatar Apr 11 '16 18:04 dportabella

On the top of the github project page it says "Dead project -- feel free to fork and update!", so it seems there is no intention to hide this. maybe send a PR for the README?

jayvdb avatar May 19 '16 00:05 jayvdb

Does anyone have any leads on how to change the code to allow it to work ?

AlexandreCassagne avatar May 25 '16 12:05 AlexandreCassagne

there are other tools here: https://meta.wikimedia.org/wiki/Data_dumps/Tools_for_importing

dportabella avatar May 25 '16 13:05 dportabella