source-serif icon indicating copy to clipboard operation
source-serif copied to clipboard

Support for Sanskrit and Pali, especially ṁ

Open sujato opened this issue 3 years ago • 5 comments

The ISO standard for Sanskrit/Pali requires ṁ, which is still missing for Source Serif. There are very few quality fonts for ancient Indian languages, and in the free world, none with Source's qualities.

The blog post for Source Serif 4 indicates that AL-5 support is upcoming, which includes ṁ. Yay! So +1 for this! :+1:

Meanwhile, thanks to everyone who has made Source Serif happen. I'm using variable fonts with optical sizing on the web. For free! And it just works! Amazing how far we've come. :pray:

sujato avatar Aug 06 '21 11:08 sujato

Even though it’s not encoded as a precomposed glyph, Source Serif already supports ṁ. You can use m followed by U+0307 COMBINING DOT ABOVE, which is defined by Unicode to be canonically identical.

projectshifter avatar Aug 08 '21 04:08 projectshifter

image

projectshifter avatar Aug 08 '21 04:08 projectshifter

Thanks for the help, I didn't realize this.

What i have found is that using the precomposed glyph ṁ it "just works" in HTML, so presumably the browser is clever enough to compose the glyph. However in LuaLaTex that doesn't work; you have to add m + U+0307. Which is okay as a workaround, but still, it'd be nicer without this gotcha.

sujato avatar Sep 22 '21 06:09 sujato

What i have found is that using the precomposed glyph ṁ it "just works" in HTML, so presumably the browser is clever enough to compose the glyph. However in LuaLaTex that doesn't work; you have to add m + U+0307. Which is okay as a workaround, but still, it'd be nicer without this gotcha.

You should be able to use the newunicodechar package to work around this:

\usepackage{newunicodechar}
\newunicodechar{ṁ}{m\char"0307}

(Incidentally, when do you use ṁ? I’m familiar with IAST which uses ṃ for the anusvara)

dpk avatar Feb 01 '23 10:02 dpk

You should be able to use the newunicodechar package to work around this:

Thanks for the tip.

(Incidentally, when do you use ṁ? I’m familiar with IAST which uses ṃ for the anusvara)

I run SuttaCentral, which uses ISO 15919. It's technically superior on several grounds, not least in maintaining consistency between multiple Indic languages.

Incidentally, I was just doing some background research today, and I discovered to my surprise that ṁ overdot was, in fact, the recommended character for anusvāra at the Geneva Orientalist Congress of 1894, and therefore is the official IAST form. No idea how everyone started using ṃ underdot!

https://discourse.suttacentral.net/t/it-seems-anusvara-was-represented-by-not-at-the-geneva-congress-of-1894/28164

sujato avatar Feb 17 '23 04:02 sujato