pdf2svg icon indicating copy to clipboard operation
pdf2svg copied to clipboard

Is is possible to convert to SVG but keep text as text?

Open Dingo64 opened this issue 8 years ago • 12 comments

Is is possible to convert to SVG but keep text as text?

Dingo64 avatar Sep 03 '17 15:09 Dingo64

I thing "pdf2svg" is not able to do anything about that, it depends of Poppler or Cairo library

RonanKER avatar Jan 15 '19 11:01 RonanKER

@RonanKER ,do you hava any code or configuration to show it ? i am looking for the way to let pdf2svg keep text as text from google for a week ,but nothing useful for me ,can you help me ?

yuweiming2016 avatar Sep 17 '19 07:09 yuweiming2016

If you want to keep text in the SVG then your best bet is to use Inkscape. I'm fairly sure it can be used from the command line to automate the conversion with text (though I've never used it for automated PDF -> SVG, only manually). Be aware that text often moves around a bit (the kerning is often a little off) when converting from a PDF.

dawbarton avatar Sep 17 '19 10:09 dawbarton

See https://inkscape.org/doc/inkscape-man.html for details on the Inkscape command line.

dawbarton avatar Sep 17 '19 10:09 dawbarton

I have learned to use Inkscape for a week. as i know Inkscape can just convert pdf to svg for the first page.is this real? this is bad news for me.@dawbarton

yuweiming2016 avatar Sep 20 '19 02:09 yuweiming2016

It can open any page when opening with the gui. If you want everything via the command line, you can simply use qpdf or pdftk to extract the page you want from the PDF as a single page and then use Inkscape. (Inkscape might be able to do page selection from the command line, I just don't know how.)

dawbarton avatar Sep 20 '19 08:09 dawbarton

i google for a long time ,but nothing is useful,so sad

image

yuweiming2016 avatar Sep 24 '19 07:09 yuweiming2016

I got an old batch script from 2015 when I tryed it (with pdftk and inkscape) : test_inkscape.txt

in the folder 'in' I put several pdf exemple/test files, and then i lunched several similar batch files to try several solutions (inkscape, pdf2svg, pdftron, poppler, ...) and then compare results.

If you can afford it, i think pdftron was the best, but i'm not sure it would preserve text as you wich.

RonanKER avatar Sep 24 '19 08:09 RonanKER

could anyone hint me in the right direction to understand why neither cairo nor poppler preserve text during pdf to svg conversion (to find some workaround to force them to keep it)? Does this procedure have a name? Is it "text vectorization" by any chance?

By the way I've tried inkscape as well, but no luck. Libreoffice seemed to work, but it was extremely slow and created a large .svg file, which is very hard to open.

danielk892374 avatar Apr 24 '24 22:04 danielk892374

I'm not sure what the name is ("preserve text" would have been my guess). Inkscape is usually the best in recent years - I've not had any problems with the PDFs that I've given it recently. It might be worth running pdftotext on your PDF to see if it does actually contain any text.

dawbarton avatar Apr 25 '24 10:04 dawbarton

After some research on PDFs in general I've realized that the problem was in the text being not a "regular text", but as part of "annotaton/comments" objects. These often get ignored when being imported and I believe that inkscape excluded them as well.

danielk892374 avatar May 11 '24 22:05 danielk892374