bookworm icon indicating copy to clipboard operation
bookworm copied to clipboard

Remove page IDs when saving image to text or scanning to text using OCR

Open DraganRatkovich opened this issue 2 years ago • 5 comments

Is your feature request related to a problem? Please describe.

When saving an image to a text file or selecting the Scan to Text File option and selecting a scanned book for text extraction using OCR, Bookworm adds Page 1 Page 2 identifiers to the text file, which is useless in this case, because it doesn't help in any way when pasting this text into a Word document to automatically arrange the pages like in the previous document, Word will very easily do the rest of the work for itself, plus the additional font, paragraph style, line spacing will be applied to the text if the user of this would require, so writing in a text file Page 1 , Page 2 and the extra page brake character is very useless, no text format exporters, at least the popular ones like MSWord, Adobe PDF, do this.

Describe the solution you'd like

Simply extract pure text from a PDF file or image without adding a Word "page" and numbers, and a page brake symbol. @mush42 It will be very useful if fixed soon because saving as a text file of a pdf or word document will be increased many times and the text will be clean and smooth.

DraganRatkovich avatar Mar 12 '22 18:03 DraganRatkovich

Hello @DraganRatkovich

I may agree with removing the page numbering, but the page break char is semantically important, specially for OCR results.

Anyhow, I'll make text exporting customizable. A dialog box will be shown when exporting to plane text or scanning to text file.

Best Musharraf

mush42 avatar Mar 12 '22 21:03 mush42

@mush42 Yes, it would be nice if checkboxes appeared during the save process in order to remove or save page brake symbols, etc.

DraganRatkovich avatar Mar 13 '22 09:03 DraganRatkovich

Hello @mush42 do you have any news on this issue?

DraganRatkovich avatar Apr 05 '22 18:04 DraganRatkovich

@DraganRatkovich Yes. the fix is coming.

mush42 avatar Apr 06 '22 10:04 mush42

@mush42 Also, I didn't change the title, but please consider also adding options to select when saving any document in txt format, like from .pdf, docx, etc, not only when saving an image or scanning to text using OCR.

DraganRatkovich avatar Apr 06 '22 12:04 DraganRatkovich