grobid_client_python
grobid_client_python copied to clipboard
Lacking capability for in-memory processing.
The fact that output can only be written to files and not kept in memory for further processing is a major drawback. I suggest returning a dictionary with all the TEI objects.
Hi @maxupp !
If you process just one file, client.process_pdf()
returns the response in memory and you can just parse it with a python XML parser.
If you process files in batch, instead of writing the server responses in files on disk you can change the behavior here: https://github.com/kermitt2/grobid_client_python/blob/master/grobid_client/grobid_client.py#L228
Or do I misunderstand the issue?
The idea of this client is to provide a simple basis (only dependencies on standard python libraries) that can be extended as needed.