react-pdf icon indicating copy to clipboard operation
react-pdf copied to clipboard

Text copied from PDF is sometimes incorrect

Open jasiekkrk opened this issue 4 years ago • 7 comments

Describe the bug When text in PDF contains string fi the resulting document is bit broken - when text is copied, it is not matching text visible in PDF. I suspect this is because fi is ligature? It happens with different fonts, but not with those which are built-in.

To Reproduce This REPL shows the issue https://react-pdf.org/repl?code=3187b0760ce02e00408a057025803c450298c0bc300500943807cf805030c00f0022230080b6198531165d400a02180e659a004f003619b006f61622003a0046200099080beed3a72a005430a58d3c54a8a231cccfbd474d9478c00a2606022700cc918984ab08bb016478c180002c780068bcb018844060c04119e4009cb1e260011d903093627940929478939822d343ac6d18104362a0903118001d6332b000dc90da016eec5c618290210a44f963bc6044402122270200bcec586079a03cc4722b28a801e975f434b5b778040e76e8199959d80801b8c800c5c0a16452f9073092f024395c79183c84002e18001c800f210003b8f0444a10584381024b018120e0940a08d08203b6db5713ce47c680f0eac05928118db0836ca6d0d876cda0046003336c009a007d6d001a4006a0036100003800ea2036bc800aa4a6663039000d00270f000822ad90635c20b22a96e64502400c26190e060006563460cdc10c0619ec014892305f0e058a0c0efa69f1ac33520e61860632004c08cd1ba5522241f0c0a8e00b03ef08ab7aa0f73f8035110ba5c238aa046eaee00250c1e4a05c1a3dc5e2c6f27ca888543a0b0db622dc8000

Expected behavior Copied text matches PDF document

jasiekkrk avatar May 22 '20 13:05 jasiekkrk

Any progress on that? Same thing in my project. If I delete any occurrence of fi it works perfectly, but normally copied text is broken. I use Lato font.

LukaszMiskowiak avatar Aug 03 '20 10:08 LukaszMiskowiak

@LukaszMiskowiak and @jasiekkrk, There are other issues here like #629 that mention downloading the BirdFont software. I had the same issue using the Roboto font where fi was not working correctly. After purchasing the commercial Birdfont software for 4.99 USD, I was able to import the font, and export it with default settings and now works just fine.

ddcech avatar Jan 22 '21 10:01 ddcech

Exporting SourceHanSans with BirdFont did not fix all my issues. Some text copies correctly, but many do not.

I also have the same issue with fonts like Inter.

whaleprophet avatar Oct 19 '21 22:10 whaleprophet

Any progress on that ? 🙏

dglsn avatar Nov 04 '21 09:11 dglsn

If someone else is using google fonts I also managed to fix the fonts using fontforge with the following command

fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf yyy.ttf

santialbo avatar Apr 06 '22 11:04 santialbo

In my case suspect is font file.

After downloading font from another source and import in OpenSource BirdFont free software and export and use in my website.

It's working as expected. Hope it helps someone :)

SujalShah3234 avatar Jul 13 '22 12:07 SujalShah3234

I think we can close this bug as there are some easy workarounds

jasiekkrk avatar Sep 01 '22 18:09 jasiekkrk

I had this issue with a font named SourceSansPro.

I tried using BirdFont to open and export with default settings. I noticed that it exports 2 files, one is suffixed with Mac. I tried both of them and for both I get an error

error - TypeError: Cannot read properties of undefined (reading 'tag')
 at $79ea6270f0a90256$export$2e2bcd8739ae039.selectScript (_______________ /node_modules/.pnpm/[email protected]/node_modules/fontkit/dist/main.cjs:7964:38)
....

It took me a while to figure out how to use the fontforge CLI on a Mac. I tried:

/Applications/FontForge.app/Contents/Resources/opt/local/bin/fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf  yyy.ttf

The new font file worked by it didn't seem to fix the issue. I also tried to export the file from the UI.

In the end, I just picked a different font (Lato) and it worked fine.

I don't think the workarounds are adequate enough to close this ticket.

justin-hackin avatar Feb 17 '23 21:02 justin-hackin

It took me a while to figure out how to use the fontforge CLI on a Mac. I tried:

/Applications/FontForge.app/Contents/Resources/opt/local/bin/fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf  yyy.ttf

The font worked by it didn't seem to fix the issue.

justin-hackin avatar Feb 17 '23 21:02 justin-hackin

I had this issue with a font named SourceSansPro.

I tried using BirdFont to open and export with default settings. I noticed that it exports 2 files, one is suffixed with Mac. I tried both of them and for both I get an error

error - TypeError: Cannot read properties of undefined (reading 'tag')
 at $79ea6270f0a90256$export$2e2bcd8739ae039.selectScript (_______________ /node_modules/.pnpm/[email protected]/node_modules/fontkit/dist/main.cjs:7964:38)
....

It took me a while to figure out how to use the fontforge CLI on a Mac. I tried:

/Applications/FontForge.app/Contents/Resources/opt/local/bin/fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf  yyy.ttf

The new font file worked by it didn't seem to fix the issue. I also tried to export the file from the UI.

In the end, I just picked a different font (Lato) and it worked fine.

I don't think the workarounds are adequate enough to close this ticket.

What version are you using? I had the same issue and haven't fixed it. I even changed to using the Lato font but still can't fix it. Please give me an answer.

hungnguyenvan-itr avatar Feb 21 '23 11:02 hungnguyenvan-itr

I had this issue with a font named SourceSansPro. I tried using BirdFont to open and export with default settings. I noticed that it exports 2 files, one is suffixed with Mac. I tried both of them and for both I get an error

error - TypeError: Cannot read properties of undefined (reading 'tag')
 at $79ea6270f0a90256$export$2e2bcd8739ae039.selectScript (_______________ /node_modules/.pnpm/[email protected]/node_modules/fontkit/dist/main.cjs:7964:38)
....

It took me a while to figure out how to use the fontforge CLI on a Mac. I tried:

/Applications/FontForge.app/Contents/Resources/opt/local/bin/fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf  yyy.ttf

The new font file worked by it didn't seem to fix the issue. I also tried to export the file from the UI. In the end, I just picked a different font (Lato) and it worked fine. I don't think the workarounds are adequate enough to close this ticket.

What version are you using? I had the same issue and haven't fixed it. I even changed to using the Lato font but still can't fix it. Please give me an answer.

I used the latest version of these software at the time of posting. I'm sorry to hear that changing the font didn't help you. I got my Lato fonts from Google Fonts.

justin-hackin avatar Feb 21 '23 16:02 justin-hackin

I had this issue with a font named SourceSansPro. I tried using BirdFont to open and export with default settings. I noticed that it exports 2 files, one is suffixed with Mac. I tried both of them and for both I get an error

error - TypeError: Cannot read properties of undefined (reading 'tag')
 at $79ea6270f0a90256$export$2e2bcd8739ae039.selectScript (_______________ /node_modules/.pnpm/[email protected]/node_modules/fontkit/dist/main.cjs:7964:38)
....

It took me a while to figure out how to use the fontforge CLI on a Mac. I tried:

/Applications/FontForge.app/Contents/Resources/opt/local/bin/fontforge -lang=ff -c 'Open($1); Generate($2); Close();' xxx.ttf  yyy.ttf

The new font file worked by it didn't seem to fix the issue. I also tried to export the file from the UI. In the end, I just picked a different font (Lato) and it worked fine. I don't think the workarounds are adequate enough to close this ticket.

What version are you using? I had the same issue and haven't fixed it. I even changed to using the Lato font but still can't fix it. Please give me an answer.

I used the latest version of these software at the time of posting. I'm sorry to hear that changing the font didn't help you. I got my Lato fonts from Google Fonts.

I upgraded to the latest version and try to use Lato font but it still doesn't work. Do you re-export font by fontforge?

hungnguyenvan-itr avatar Feb 22 '23 02:02 hungnguyenvan-itr

Any update on this issue? it is happening with all external fonts, moreover, we have tried both fontforge and birdFont and even after importing and exporting the same problem still exists, we have also tried the same font from a different source and each source caused the same issue with a different level.

harshkurra avatar Mar 22 '23 13:03 harshkurra

Experiencing this issue with Noto Sans SC. It's so bad that the phrase "Default Priority" is copied as TVvn0a/ F6s.6s/l. Using the regular Noto Sans font imported directly from Google yields DeSault :riority for the same phrase. Clearly it is affected by the support for CJK characters, but it is obvious that neither is good.

edwindwalker avatar Apr 06 '23 12:04 edwindwalker

June 7th Edit: This is still not fully functioning, some letters are wrong.

I also tried the following other solutions:

  • Tried to export using FontSquirrel with different options to fix the font
  • Tried to fix the font using the Glyphs software
  • Tried downloading the latest original font from the author

@diegomura it would be nice to understand a bit more about what could be causing this issue as this it preventing our users to search in the PDF we are providing them.


I just found out about this issue today while copy-pasting some text on my PDF (also found this related issue https://github.com/diegomura/react-pdf/issues/1950 - not sure if its worth keeping these 2 issues open since they look the same)

Here is my fix if it can help anyone:

  • Installed FontLab (yeah I know it's not free, unfortunately...)
  • Open my .ttf font (in my case I was having an issue with Gilroy)
  • Once the font is opened, select all characters using CTRL-A
  • From the menu:
    • Tools > Font Audit > Check Glyphs
    • Tools > Font Audit > Fix problems
    • File > Export Font As > my-file-name.ttf (the file size was about 2x the original file)
  • Use this new file in React-pdf and it works! 🎉

I'm not sure how React-pdf uses the fonts to generate PDFs but given that I use the same font for my website and it works fine, it does look like this issue is specific to React-pdf and hopefully will get fixed

nbouvrette avatar Jun 06 '23 14:06 nbouvrette

I had the same problem and I solved it using the method mentioned in the comment above with 'BirdFont'. However, it seems like there might be a way to solve it without modifying the font. Check out this https://github.com/Hopding/pdf-lib/issues/245.

yomybaby avatar Aug 10 '23 02:08 yomybaby

I ran into this same issue when developing an open source resume marker. I was able to resolve it by either re-exporting with fontforge or BirdFont as others have suggested. Thanks all for sharing the solutions.

For anyone running into this issue but looking for some quick workable fonts, you can find some of them in this project's repo: https://github.com/xitanggg/open-resume/tree/main/public/fonts. I re-exported 11 google fonts and got them to work:

  • Sans Serif: Roboto, Lato, Montserrat, OpenSans, Raleway
  • Serif: Caladea, Lora, RobotoSlab, PlayfairDisplay, Merriweather
  • Non-English: NotoSansSC

What @yomybaby shared is interesting, it may or may not be related. The linked issue only exists in Chrome and is said to be fixed, but this issue can be reproduced in Chrome, Firefox, Edge, etc. It is likely another upstream issue and would be nice to have it fixed since it posts an limitation on any app built on top of it and react pdf as all fonts need to be re-exported. As such, we won't be able to say allow users to upload or provide a link to any arbitrary font since it would run into this issue.

xitanggg avatar Aug 22 '23 06:08 xitanggg

I've had some luck for some of the fixes mentioned with FontLab8 and fontforge in the thread here. Unfortunately, there are still a few fonts (for example "Barlow") which still output on the text layer with issues even after glyphs have been corrected.

rob2d avatar Sep 06 '23 06:09 rob2d

This issue #1950 and pullrequest #2408 seem to be related

Haschtl avatar Oct 24 '23 12:10 Haschtl

This was fixed in #2408 / #2488. The test case in the REPL now works correctly. This issue can be closed.

carlobeltrame avatar Jan 15 '24 11:01 carlobeltrame

Thanks @carlobeltrame !

diegomura avatar Jan 15 '24 11:01 diegomura