pdfparser
pdfparser copied to clipboard
Outputs "\r\n t" instead of letter i
It gives line break and letter t instead of letter i in some parts of the related file. Such an error may occur because the relevant text is in Turkish. However, there are places in the same text that do not give this error. Sample:
Talep ed
tlen belgeler
t (KPSS sonuç belges
t ve yabancı d
tl b
tlg
ts
t sev
tyes
tn
t gösteren
belge har
tç) eks
tk
tbraz eden veya h
tç
tbraz etmeyenler bu belgeler
t son başvuru tar
th
tne kadar
Başkanlığımıza
tbraz ett
tkler
t takd
trde talepler
t kabul ed
tlecek, son başvuru tar
th
tnden sonra
tbraz
Related file: https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211126-4-9.pdf Note:Since the file is taken from the Official Gazette, it is not subject to any copyright.
Similar problem.
ÖĞR.ÜYESİ 2 * Pedont
t Anab
tl
tm Dalında Doktora veya Uzmanlık
yapmış olmak. DİŞ HEKİMLİĞİ FAKÜLTESİ Protet
tk D
tş Tedav
ts
t DR. ÖĞR.
ÜYESİ 2 ** Protet
tk D
tş Tedav
ts
t Anab
tl
tm Dalında Doktora
veya Uzmanlık yapmış olmak. DİŞ HEKİMLİĞİ FAKÜLTESİ Restoratif Diş Tedavisi PROF. DR. 1* Restorat
tf D
tş Tedav
ts
t Alanında Doçent veya
Profesör olmak DİŞ HEKİMLİĞİ FAKÜLTESİ Restoratif Diş Tedavisi DR. ÖĞR.
ÜYESİ 2* Restorat
tf D
tş Tedav
ts
t Anab
tl
tm Dalında Doktora
veya Uzmanlık yapmış olmak.
*Türkçe Diş Hekimliği
Related file: https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211129-4-12.pdf Note: Since the file is taken from the Official Gazette, it is not subject to any copyright.
It turns out my update does not fix this issue. The newlines get repaired, but the 'i' character is still being translated as "\rt" or 0d74. The translate table that Font::translateChar() is using has 'i' at array entry 76, but it is encoded in the TJ command as literal bytes 0d74. More research is needed to see why this string actually gets translated to 'i'.
It turns out my update does not fix this issue. The newlines get repaired, but the 'i' character is still being translated as "\rt" or
0d74. The translate table thatFont::translateChar()is using has 'i' at array entry 76, but it is encoded in theTJcommand as literal bytes0d74. More research is needed to see why this string actually gets translated to 'i'.
Thank you for your detailed research and analysis, I hope your further research will find a way to solve this.