Etienne Posthumus
Etienne Posthumus
How about using the FTS support of sqlite? https://www.sqlite.org/fts5.html
http://thedatahub.org/user/epoz +1 on the datahub group, very cool feature.
Ed, the first record in your JSON output is not a dictionary, but a string. The BibServer importer was failing here: https://github.com/okfn/bibserver/blob/ecc08d230027a0a3fc2c788f9730bcf9825b92b5/bibserver/importer.py#L163 Trying to assign stuff to a unicode string....
Can we add a -bibserver command line switch that outputs: {"display_name": "MARC", "format": "marc", "contact": "Edmund Chamberlain [email protected]", "bibserver_plugin": true} The latest version can be found at: https://github.com/okfn/bibserver/blob/master/parserscrapers_plugins/marc2BibJson.pl
We need to install the Perl MARC modules on the bibsoup server. I mailed Nils about that asking permission, but need to ping him again as I did not receive...
Thanks for the precise answer @jbarlow83 (and the amazing library). Have been trying to figure out how to get the dimensions by grokking your answer from https://github.com/jbarlow83/OCRmyPDF/blob/master/src/ocrmypdf/pdfinfo/info.py#L180 ...but how to...
OK, looking at https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf now, thanks for the reminder. And for anyone else looking at this in future, the relevant part in the spec explaining what pikepdf.Matrix is about, see...
And even easier, someone else pointed out to me that with PyMuPDF the heavy-lifting is done for you, see: https://pymupdf.readthedocs.io/en/latest/faq.html#how-to-extract-images-pdf-documents
@schall1337 I forget the exact details, but yes, pymupdf was the answer.
A simple example: ```python from rdflib import Graph g = Graph().parse("http://www.w3.org/People/Berners-Lee/card") g.query("CONSTRUCT WHERE {?s ?p ?o} LIMIT 10") # TypeError: 'NoneType' object is not iterable ``` Agreed, on the "Ew"...