obsidian-text-format
obsidian-text-format copied to clipboard
Req: replace ligatures in PDF text
Possibly related to #23, it would be good to replace ligatures with its separate characters when cleaning up text coming from a PDF file. The only time I see the ft, fl and fi ligatures is when I copy from a PDF and I have to replace them by hand. A complete list is here.
To confirm what your request is: you want to replace ligatures like ff to ff.
And in the Wikipedia you gave, you want to replace the text in column Ligature to text in column Non-Ligature.
Do I understand right?
Correct. Ligatures create all sorts of problems in plain text.
plz try in v1.8.1. If have problems you can re-open this issue.
Unfortunately, the problem is still there. Eg take the sentence below from the wikipedia page I referenced:
Other ligatures with the letter f include fj,[a] fl (fl), ff (ff), ffi (ffi), and ffl (ffl).
In every set of brackets there is a single character. In plain text these characters should be split. Ligatures are often found in PDFs (well, the ones I use anyway) and they are meant to make certain combinations of letters look good in typography. The problem is that when pasted in plain text, these ligatures are replaced with funny looking symbols if a plain text editor can't cope with unicode, or they will be displayed properly but they won't be recognised by Search, spellchecking, content indexing etc. The latter is the problem I am having with Obsidian.
This plugin is the obvious place for fixing this. If you must be selective, almost all ligatures I see in practice start with f and s. I can't remember when I saw any other ligatures in a PDF last time.
v2.2.1
For sentence like
Other ligatures with the letter f include fj,[a] fl (fl), ff (ff), ffi (ffi), and ffl (ffl).
The result of Replace ligatures is
Other ligatures vvith the letter f include fj,[a] fl (fl), ff (ff), ffi (ffi), and ffl (ffl).
Is the result not behave as you expect? (may be the w to vv)
I hadn't realised you needed to ask every time for ligatures to be replaced. I was assuming there would be a persistent setting similar to 'Remove redundant ...' or that the ligatures would be automatically replaced by default. I am not sure how often, if ever, these ligatures need to be retained in Obsidian. They break searching, spell checking etc.
But yes, Replace ligatures works. Thanks!