Roey Lalazar
Roey Lalazar
Disabled ssl checks for now
Have you tested that locally + inside a docker?
Hey! Just like I commented on that other PR, it should be pdf->html then we parse html>text
@rishi003 let's chat on discord! I could guide you a little bit :)
Good vision @Itaykal , wanna do it?
Awsome dude! Are you on Discord? Let's chat.
https://discord.gg/EJYfBkd4
@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?
@d4yz so we need pdf_to_html, and then use html_to_text, like we do for .docx - [ ] convert pdf to html then to text, for preserving title information