marker
marker copied to clipboard
Add image label to output MD file.
- Using Pymupdf package to extract image bbox and sorted with y-position, adding the MD formated image label as text to the output markdown file;
- Image data saved in metadata.json file with key "image" and is a Dict, format: {img_path: img_byte_content}, it then could be saved to each path with the file
convert_single.py
. - Not all pictures in pdf (such as image on page 2 of Multi-column CNN) could not be identified, as noted by @yachty66. But technically that is not a picture, it is an image formed with text boxes and arrows, etc. I am unsure about how to resolve this at the moment as well. Hope it could helps :)
@tungsten106 Thanks for much for this! It was on my list of functionality to add soon. I'll take a look next week (after the holiday).
@tungsten106 I'd love to review this, but the diffs seem to have issues (entire file is shown as deleted, with all the lines also shown as added). I'm having a hard time seeing what was changed. Do you know why this is happening with the diffs?
@tungsten106 I'd love to review this, but the diffs seem to have issues (entire file is shown as deleted, with all the lines also shown as added). I'm having a hard time seeing what was changed. Do you know why this is happening with the diffs?
It is probably a problem raised by Windows vscode end-of-line sequence settings. I have changed its selection from CRLF back to LF, and the diff should work now.
Following to know when this is implemented. With GPT4V out, the focus is on multimodal retrieval systems. Since marker outperforms most pdf readers, the addition of images would make it very valuable for general purpose pdf loading for this purpose.
Not all pictures in pdf (such as image on page 2 of Multi-column CNN) could not be identified, as noted by @yachty66. But technically that is not a picture, it is an image formed with text boxes and arrows, etc. I am unsure about how to resolve this at the moment as well.
Why can't we do somethingg like get the box and screenshot that part and add
After adding the image, continue to add the translation function to the project, and right-click the image and select GPT-4-vision to answer, which will be a great essay tool.
Is the image extract feature included in latest, as today, i cloned git-master branch (as there is no release) and ran i couldnt get the image in output .md file, I thought, MD file, will have image embeddings in it.. but didnt find any Should i set any variable, to extract image, and emebd it tinto, output md file?..
is this feature upcoming..
also, is there any way, I can run this on hugginface, deploy there -- can you create something similar, some remote solution