bibserver icon indicating copy to clipboard operation
bibserver copied to clipboard

MARC parser

Open markmacgillivray opened this issue 13 years ago • 12 comments

MARC parsing will give access to large amounts of library data

markmacgillivray avatar Jan 18 '12 14:01 markmacgillivray

To be done either as a parser for inclusion in the repo, or as a an external parser that runs remotely and sends an import to bibserver - either way will be a good example of particular functionality

markmacgillivray avatar Jan 18 '12 14:01 markmacgillivray

I've a barebones Perl based parser up as a gist:

https://gist.github.com/1836836

Should accept stdin. JSON seems valid but does not upload to bibsoup. Getting a 'unicode' object has no attribute 'get'. I'm not familar with the JSON module, but am wondering if I need to be more explicit about headers...

edchamberlain avatar Feb 15 '12 16:02 edchamberlain

Ed, the first record in your JSON output is not a dictionary, but a string.

The BibServer importer was failing here: https://github.com/okfn/bibserver/blob/ecc08d230027a0a3fc2c788f9730bcf9825b92b5/bibserver/importer.py#L163 Trying to assign stuff to a unicode string.

We are improving the parser/importer to give better feedback on these kinds of errors. It should have ideally just failed on that record given feedback and continued. Looking into how to do this in a structured manner.

epoz avatar Feb 16 '12 10:02 epoz

Thanks. I'll take a look at the blank first line.

edchamberlain avatar Feb 16 '12 12:02 edchamberlain

Caused by a bad decleration, now fixed.

edchamberlain avatar Feb 16 '12 12:02 edchamberlain

Fe more tweaks, manual upload of output seems fine, all 953 records imported

http://bibsoup.net/edchamberlain/marc21_sample

edchamberlain avatar Feb 16 '12 13:02 edchamberlain

Can we add a -bibserver command line switch that outputs: {"display_name": "MARC", "format": "marc", "contact": "Edmund Chamberlain [email protected]", "bibserver_plugin": true}

The latest version can be found at: https://github.com/okfn/bibserver/blob/master/parserscrapers_plugins/marc2BibJson.pl

epoz avatar Mar 21 '12 17:03 epoz

This is done, along with a few other tweaks.

edchamberlain avatar Mar 30 '12 10:03 edchamberlain

What is left to be done to get MARC parser working? @epoz can you let @edchamberlain know what is required? Then we can get the MARC parser available too.

markmacgillivray avatar Apr 24 '12 11:04 markmacgillivray

We need to install the Perl MARC modules on the bibsoup server. I mailed Nils about that asking permission, but need to ping him again as I did not receive a reply. On my local machine the MARC parser works.

epoz avatar Apr 25 '12 10:04 epoz

I added the perl requirement to the ticket re. moving to different server and got no complaints, so we can install on there. The new server by the way is s063. Let me know if you cant login to it

markmacgillivray avatar Apr 25 '12 10:04 markmacgillivray

Additional tweaks made to parser code. Should be fairly complete. Currently testing on Harvard data.

edchamberlain avatar May 22 '12 15:05 edchamberlain