obsidian-text-format icon indicating copy to clipboard operation
obsidian-text-format copied to clipboard

Req: replace ligatures in PDF text

Open glocalglocal opened this issue 3 years ago • 6 comments

Possibly related to #23, it would be good to replace ligatures with its separate characters when cleaning up text coming from a PDF file. The only time I see the ft, fl and fi ligatures is when I copy from a PDF and I have to replace them by hand. A complete list is here.

glocalglocal avatar Apr 27 '22 17:04 glocalglocal

To confirm what your request is: you want to replace ligatures like to ff.

And in the Wikipedia you gave, you want to replace the text in column Ligature to text in column Non-Ligature.

image

Do I understand right?

Benature avatar Aug 23 '22 12:08 Benature

Correct. Ligatures create all sorts of problems in plain text.

glocalglocal avatar Aug 23 '22 19:08 glocalglocal

plz try in v1.8.1. If have problems you can re-open this issue.

Benature avatar Aug 24 '22 02:08 Benature

Unfortunately, the problem is still there. Eg take the sentence below from the wikipedia page I referenced:

Other ligatures with the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

In every set of brackets there is a single character. In plain text these characters should be split. Ligatures are often found in PDFs (well, the ones I use anyway) and they are meant to make certain combinations of letters look good in typography. The problem is that when pasted in plain text, these ligatures are replaced with funny looking symbols if a plain text editor can't cope with unicode, or they will be displayed properly but they won't be recognised by Search, spellchecking, content indexing etc. The latter is the problem I am having with Obsidian.

This plugin is the obvious place for fixing this. If you must be selective, almost all ligatures I see in practice start with f and s. I can't remember when I saw any other ligatures in a PDF last time.

v2.2.1

glocalglocal avatar May 03 '23 10:05 glocalglocal

For sentence like

Other ligatures with the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

The result of Replace ligatures is

Other ligatures vvith the letter f include fj,[a] f‌l (fl), f‌f (ff), f‌f‌i (ffi), and f‌f‌l (ffl).

Is the result not behave as you expect? (may be the w to vv)

Benature avatar Jul 24 '23 02:07 Benature

I hadn't realised you needed to ask every time for ligatures to be replaced. I was assuming there would be a persistent setting similar to 'Remove redundant ...' or that the ligatures would be automatically replaced by default. I am not sure how often, if ever, these ligatures need to be retained in Obsidian. They break searching, spell checking etc.

But yes, Replace ligatures works. Thanks!

glocalglocal avatar Jul 24 '23 17:07 glocalglocal