gerev icon indicating copy to clipboard operation
gerev copied to clipboard

PDF Parser, GoogleDrive support for PDF, README.md minor fix

Open d4yz opened this issue 2 years ago • 3 comments

d4yz avatar Mar 22 '23 01:03 d4yz

@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?

Roey7 avatar Mar 22 '23 08:03 Roey7

@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?

Yes, just for the titles.

bary12 avatar Mar 22 '23 08:03 bary12

@d4yz so we need pdf_to_html, and then use html_to_text, like we do for .docx

  • [ ] convert pdf to html then to text, for preserving title information

Roey7 avatar Mar 22 '23 09:03 Roey7