pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

Outputs "\r\n t" instead of letter i

Open Reqrefusion opened this issue 3 years ago • 3 comments

It gives line break and letter t instead of letter i in some parts of the related file. Such an error may occur because the relevant text is in Turkish. However, there are places in the same text that do not give this error. Sample:

Talep ed
tlen belgeler
t (KPSS sonuç belges
t ve yabancı d
tl b
tlg
ts
t sev
tyes
tn
t gösteren
belge har
tç) eks
tk
tbraz eden veya h
tç
tbraz etmeyenler bu belgeler
t son başvuru tar
th
tne kadar
Başkanlığımıza
tbraz ett
tkler
t takd
trde talepler
t kabul ed
tlecek, son başvuru tar
th
tnden sonra
tbraz

Related file: https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211126-4-9.pdf Note:Since the file is taken from the Official Gazette, it is not subject to any copyright.

Reqrefusion avatar Nov 26 '21 17:11 Reqrefusion

Similar problem.

ÖĞR.ÜYESİ 2 * Pedont
t Anab
tl
tm Dalında Doktora veya Uzmanlık
yapmış olmak. DİŞ HEKİMLİĞİ FAKÜLTESİ Protet
tk D
tş Tedav
ts
t DR. ÖĞR.
ÜYESİ 2 ** Protet
tk D
tş Tedav
ts
t Anab
tl
tm Dalında Doktora
veya Uzmanlık yapmış olmak. DİŞ HEKİMLİĞİ FAKÜLTESİ Restoratif Diş Tedavisi PROF. DR. 1* Restorat
tf D
tş Tedav
ts
t Alanında Doçent veya
Profesör olmak DİŞ HEKİMLİĞİ FAKÜLTESİ Restoratif Diş Tedavisi DR. ÖĞR.
ÜYESİ 2* Restorat
tf D
tş Tedav
ts
t Anab
tl
tm Dalında Doktora
veya Uzmanlık yapmış olmak.
*Türkçe Diş Hekimliği

Related file: https://www.resmigazete.gov.tr/ilanlar/eskiilanlar/2021/11/20211129-4-12.pdf Note: Since the file is taken from the Official Gazette, it is not subject to any copyright.

Reqrefusion avatar Nov 29 '21 12:11 Reqrefusion

It turns out my update does not fix this issue. The newlines get repaired, but the 'i' character is still being translated as "\rt" or 0d74. The translate table that Font::translateChar() is using has 'i' at array entry 76, but it is encoded in the TJ command as literal bytes 0d74. More research is needed to see why this string actually gets translated to 'i'.

GreyWyvern avatar Aug 14 '23 16:08 GreyWyvern

It turns out my update does not fix this issue. The newlines get repaired, but the 'i' character is still being translated as "\rt" or 0d74. The translate table that Font::translateChar() is using has 'i' at array entry 76, but it is encoded in the TJ command as literal bytes 0d74. More research is needed to see why this string actually gets translated to 'i'.

Thank you for your detailed research and analysis, I hope your further research will find a way to solve this.

Reqrefusion avatar Sep 18 '23 22:09 Reqrefusion