Patrice Lopez
Patrice Lopez
Hi @Tanmay98 ! Sorry for the late answer and thanks for the issue. I think I commented that part on purposes because I am not happy with the current accuracy...
Ah I see ! Good to know thanks. Let's keep the issue open until someone can find the time to come back to the list in the public training data...
Hello @lucbouge ! I think the doc is right, at least in this case :) The training data format (with inline annotations of the reference string) is different from the...
Hi @silviaegt and @caifand ! Thanks a lot for the issue and the kind words on Grobid and this feedback. This work is very challenging for Grobid in its current...
Hi @silviaegt ! For info, I've added the examples of your [error page](https://github.com/ColmexBDCV/dissertations_as_data/blob/main/tei_examples/error_identification.md#page-numbers-pp200-get-tagged-as-year) to the training data with commit 2e30c274c93575a481d26c8ee771a3cd3fa743a7 (file `citations.xml`). If you have more error cases ("real" case...
Hello @yasminaaq ! Thanks for the interest in GROBID. Indeed with Arabic (like Chinese currently), we will have a mess with the labeling because there is no annotated example in...
Hello @ehapmgs ! You are right, if the asset path is not defined, the embedded graphics are not extracted and not referenced in the TEI with their coordinates. It's not...
You're not missing anything, this is exactly how expected. > It's not very clear why we would like to crop these embedded graphics from the PDF while they can be...
We probably want to have the coordinates at figure level for the global figure zone and the coordinates for the figure content (the graphics) in an element under ``? So...
I would say in principle the coordinates of `` are the bounding boxes of all contained elements. In the second case, the graphic element is not available so the area...