Siddhartha Das

Results 225 comments of Siddhartha Das

@bigfatbird Yes, I am aware of this issue. I'm using poppler utils at the moment to convert PDF to HTML to render the content and the conversion is not great...I...

@bigfatbird Can you post a screen shot here to show how the text currently for a PDF and whether the PDF is image rich or just text...

@bigfatbird thanks. looks like if I can center the content the rendering will look better...that should not be hard to achieve...will update here when I get to this issue See...

Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should...

Thats sounds great...thanks for the suggestion...looks workable at a quick glance.. https://mozilla.github.io/pdf.js/examples/

@unhammer how many books are there in the library and how many are PDFs? The actual book content is cached on the file system(if the cache preference is set) but...

hmm...while 300 PDFs seem a largi-ish library (i have not tested more than 100 PDFs), yet it does feel high just for the content data to be 1.9 GB...will look...

@bigfatbird It dosen't look like it will be possible to extract PDF to HTML using pdf.js based on this: [https://github.com/mozilla/pdf.js/issues/8732](https://github.com/mozilla/pdf.js/issues/8732) Bookworm relies upon HTML files to apply all the text/color...

Looks like poppler can be used to get the chapters from the book using this example: https://stackoverflow.com/questions/7131906/how-to-extract-pdf-index-table-of-contents-with-poppler At least it will reduce the data in the meta data database by...

@preconf unfortunately I have not spent further time on this. I tried the following library but the extraction was too slow although the rendering was better: https://github.com/coolwanglu/pdf2htmlEX Will check evince...