BookStack PDF import/indexing

Describe the feature you'd like

As said here https://github.com/BookStackApp/BookStack/issues/1270#issuecomment-463523175

You don't want to encourage users to make link lists leading to documents. Instead content should be created/copied into the editor. Ok. Nevertheless indexing PDF-files would be a great feature on-top.

Additionally, importing e.g. Word or PDF files directly into the editor would also be great (optionally with delete unwanted HTML-code

Describe the benefits this would bring to existing BookStack users

Better experience / less work.

Can the goal of this request already be achieved via other means?

No and not satisfying enough :-)

Have you searched for an existing open/closed issue?

[X] I have searched for existing issues and none cover my fundemental request

How long have you been using BookStack?

1 to 5 years

Additional context

No response

Oct 05 '22 16:10 helson22

Thanks for the request, although this is not something I'd be keen to include support for since:

It widens the scope, and lessens focus, to what we'd be considering documentation content within the platform.
Support could vary depending on the format and structure of a specific PDF document, adding variability to such a feature working.
Support would be added for certain formats, introducing variability to how different attachments/formats are treated.
There are likely cases where this would be not desired, requiring additional levels of control to be exposed which themselves can be a burden.

Oct 08 '22 12:10 ssddanbrown

This sounds like it can be done with the API. Use some kind of PDF to HTML library, pass it to HTML to Markdown and then use the bookstack API to import.

Oct 14 '22 06:10 IceWreck

Can anyone please help document the process of using the API to import content? or point me to the documentation that I am struggling to find?

Please and thank you!

Oct 12 '23 02:10 manicmarvin

Some PDF can be parsed, some need to be run through OCR. This is a big ask. OCR isn't automatic, it requires human review. For this reason I would agree that you need to normalize your dataset before you import. The best open source document conversion library is PanDoc, and it doesn't support PDF.

How can I convert PDFs to other formats using pandoc? You can’t. You can try opening the PDF in Word or Google Docs and saving in a format from which pandoc can convert directly.

Apr 15 '24 23:04 A9G-Data-Droid

BookStack BookStack copied to clipboard

PDF import/indexing

Describe the feature you'd like

Describe the benefits this would bring to existing BookStack users

Can the goal of this request already be achieved via other means?

Have you searched for an existing open/closed issue?

How long have you been using BookStack?

Additional context

BookStack
BookStack copied to clipboard