conjuncts
conjuncts
I've noticed that newer versions of marker do indeed exclude words in figures. Do you have a sample PDF where this issue persists?
I triaged the issue a bit, and I think I see how it happens. Focusing on the `- παραλαμβάνει το πλεόνασμα του μεσοκυττάριου υγρού και το επαναφέρει στο καρδιαγγειακό σύ-`...
Not sure where the bug is yet, but I observe that when modifying marker/scripts/convert.py by removing the global variable, the bug disappears. ``` converter = converter_cls( config=config_dict, # artifact_dict=model_refs, artifact_dict=create_model_dict(),...
Good suggestion. Does the newly added `--use_llm` argument suffice?
I've determined that it's an issue with pdftext. The immediate issue seems to be that the figure's bbox is out of bounds. In `marker.builder.document.DocumentBuilder`, the figure bbox is normal-sized after...
Were you running on windows by chance? I believe this could be related to #617.
That's strange, I tried it again with v1.2.0 and it is still None. Maybe my file somehow got modified. [3.pdf](https://github.com/user-attachments/files/17468433/3.pdf) ``` file_path = './bulk/3.pdf' # Open the file in binary...
Okay, it seems that I have a different version of the pdf compared to Nature's. The wget pdf has the `00000000 25 50 ...` hex, but the pdf that I...
Huh, interesting. I still would prefer if filetype were to be able to recognize this as a pdf, though. For example, pymupdf is able to open the pdf no problem....