pdf-glossary icon indicating copy to clipboard operation
pdf-glossary copied to clipboard

Non-latin character support for PDF exports

Open TeresaM12 opened this issue 8 years ago • 2 comments

I have problems with composite characters like ä ö ü Ä Ö Ü (other special characters like ß work flawless). Instead of 'ä' in a pdf document there will be an 'a' followed by a black box. In the html version it is ok. Because I liked the add on a lot, really wanted to use it and have no idea about python, I found somebody who solved the problem. Now it takes even longer to convert, but this is not an issue for me. Solution which works on my computer: in file exporter.py add line: 25> import unicodedata ### new line 26> # local libraries and change 176> question=self.escapeText(unicodedata.normalize('NFC',c.q())), ### changed 177> answer=self.escapeText(unicodedata.normalize('NFC',c.a())) ### changed 178> # question=self.escapeText(c.q()), # org line 179> # answer=self.escapeText(c.a()) # org line

Oh yes you might have realised, I'm also not able to commit those changes as I have no idea about github (sorry). I hope this helps. Regards Teresa

TeresaM12 avatar Mar 16 '18 21:03 TeresaM12

After I filed my issue I realised that glutanimate uses German as well, so it seemed improbable that the problem would be unspottet. So I tried and typed a new card with the characters and it worked in the original version. I guess in those cards ä, ö, ü are not composite of a and ". The sources of my stacks are pdf files which I converted to cvs and than imported into Anki. Either already in the original files or during those conversion steps I ended up with composite characters instead of single ones. Still in case of such characters there is an issue.

TeresaM12 avatar Mar 16 '18 21:03 TeresaM12

Thanks for the report! Yes, the pdf exporting library used by the add-on only supports a limited character set by default. Regular Umlauts seem to be part of that, while composites – for whatever reason – dont't seem to be.

I appreciate your going through the trouble of finding a workable patch that solves this. This will definitely be helpful in resolving this issue, although I don't think it will work for characters that aren't shipped in the font. E.g.: German Umlauts are available in the font, so normalizing the composites to these symbols worked. But the same would not apply to symbols in East Asian languages, as they are not part of the limited character set that the pdf library's font supports.

A definitive solution will probably require me shipping a custom font with the add-on with better UTF-8 support.

I the meanwhile what you could try is to use the add-on's user css file to embed a custom font. Instructions for this may be found here: https://xhtml2pdf.readthedocs.io/en/latest/reference.html#fonts

N.B.: This is not something I've tried myself, yet, but from what I've read in various discussion forums font support with this library can be quite buggy. So you mileage may vary.

glutanimate avatar Mar 21 '18 13:03 glutanimate