grobid_client_python icon indicating copy to clipboard operation
grobid_client_python copied to clipboard

Lacking capability for in-memory processing.

Open maxupp opened this issue 1 year ago • 1 comments

The fact that output can only be written to files and not kept in memory for further processing is a major drawback. I suggest returning a dictionary with all the TEI objects.

maxupp avatar Nov 24 '23 14:11 maxupp

Hi @maxupp !

If you process just one file, client.process_pdf() returns the response in memory and you can just parse it with a python XML parser.

If you process files in batch, instead of writing the server responses in files on disk you can change the behavior here: https://github.com/kermitt2/grobid_client_python/blob/master/grobid_client/grobid_client.py#L228

Or do I misunderstand the issue?

The idea of this client is to provide a simple basis (only dependencies on standard python libraries) that can be extended as needed.

kermitt2 avatar Nov 24 '23 14:11 kermitt2