Roey Lalazar

Results 9 comments of Roey Lalazar

Disabled ssl checks for now

Have you tested that locally + inside a docker?

Hey! Just like I commented on that other PR, it should be pdf->html then we parse html>text

@rishi003 let's chat on discord! I could guide you a little bit :)

Good vision @Itaykal , wanna do it?

Awsome dude! Are you on Discord? Let's chat.

https://discord.gg/EJYfBkd4

@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?

@d4yz so we need pdf_to_html, and then use html_to_text, like we do for .docx - [ ] convert pdf to html then to text, for preserving title information