cookbook icon indicating copy to clipboard operation
cookbook copied to clipboard

Gemini handle the pdf file?

Open helai78 opened this issue 1 year ago • 2 comments
trafficstars

Description of the feature request:

https://ai.google.dev/gemini-api/docs/prompting_with_media?lang=python based on the above link, it seems not to work on the pdf file? is my understanding right?

What problem are you trying to solve with this feature?

No response

Any other information you'd like to share?

No response

helai78 avatar May 21 '24 09:05 helai78

@helai78, As shown in documentation, Supported text formats are noted here. Gemini API won't support PDF file, as application/pdf MIME type is not supported yet. Alternatively, you can use AI Studio to work with pdf files using Gemini. Thank you!

singhniraj08 avatar May 22 '24 05:05 singhniraj08

Hello, @singhniraj08 Thank you for you clarfication.

AI Studio you mentioned is Vertex AI Gemini API which can handle pdf file. this Vertex AI is part of google could, which means 90 days free for me. is my undersanding correct?

could you tell me any alternatives to handle the pdf files with the use of gemini 1.5 pro?

thanks in adcance.

helai78 avatar May 22 '24 06:05 helai78

Hello @helai78 , Currently, there's no direct support for uploading PDF files, but we can work around this by converting the PDF to images and extracting text separately. https://github.com/google-gemini/cookbook/blob/main/quickstarts/PDF_Files.ipynb

anusonawane avatar Jul 08 '24 06:07 anusonawane

Hello @helai78 , Currently, there's no direct support for uploading PDF files, but we can work around this by converting the PDF to images and extracting text separately. https://github.com/google-gemini/cookbook/blob/main/quickstarts/PDF_Files.ipynb

Hello, @anusonawane I almost do the same thing as you mentioned, that i used the tesseract to OCR the text from the image.. but the problem is that the image should be categorized to some types: text, data chart and picture. but the function of OCR is only good for the image with text, not good for data chart and picture. and while i just have the limited token. but it is very good challenge...

helai78 avatar Jul 10 '24 03:07 helai78

Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.

github-actions[bot] avatar Aug 16 '24 03:08 github-actions[bot]

PDF files are supported now. Check out the PDF recipe for specifics.

markmcd avatar Aug 16 '24 03:08 markmcd