Siddhartha Das comments

Results 225 comments of


Siddhartha Das

PDFs don't render correctly.

@bigfatbird Yes, I am aware of this issue. I'm using poppler utils at the moment to convert PDF to HTML to render the content and the conversion is not great...I...

PDFs don't render correctly.

@bigfatbird Can you post a screen shot here to show how the text currently for a PDF and whether the PDF is image rich or just text...

PDFs don't render correctly.

@bigfatbird thanks. looks like if I can center the content the rendering will look better...that should not be hard to achieve...will update here when I get to this issue See...

PDFs don't render correctly.

Not sure, at the moment I'm using poppler util pdftohtml and I dont see any option to render the html the way it looks like in Evince viewer...perhaps I should...

PDFs don't render correctly.

Thats sounds great...thanks for the suggestion...looks workable at a quick glance.. https://mozilla.github.io/pdf.js/examples/

PDFs don't render correctly.

@unhammer how many books are there in the library and how many are PDFs? The actual book content is cached on the file system(if the cache preference is set) but...

PDFs don't render correctly.

hmm...while 300 PDFs seem a largi-ish library (i have not tested more than 100 PDFs), yet it does feel high just for the content data to be 1.9 GB...will look...

@bigfatbird It dosen't look like it will be possible to extract PDF to HTML using pdf.js based on this: [https://github.com/mozilla/pdf.js/issues/8732](https://github.com/mozilla/pdf.js/issues/8732) Bookworm relies upon HTML files to apply all the text/color...

PDFs don't render correctly.

Looks like poppler can be used to get the chapters from the book using this example: https://stackoverflow.com/questions/7131906/how-to-extract-pdf-index-table-of-contents-with-poppler At least it will reduce the data in the meta data database by...

PDFs don't render correctly.

@preconf unfortunately I have not spent further time on this. I tried the following library but the extraction was too slow although the rendering was better: https://github.com/coolwanglu/pdf2htmlEX Will check evince...