VALL-E-X
VALL-E-X copied to clipboard
Phonemizer as dependency
Is phonemicized dependency used? How is text turned into phonemes? Phonemizer has alot of phonemes that are grouped together and not standard but still IPA. for example the bpe_json file is
"vocab": {
"[UNK]": 0,
"[CLS]": 1,
"[SEP]": 2,
"[PAD]": 3,
"[MASK]": 4,
"a": 5,
"b": 6,
"d": 7,
"e": 8,
"f": 9,
"g": 10,
"h": 11,
"i": 12,
"j": 13,
"l": 14,
"m": 15,
"n": 16,
"o": 17,
"p": 18,
"s": 19,
"t": 20,
"u": 21,
"v": 22,
"w": 23,
"x": 24,
"y": 25,
"z": 26,
"~": 27,
"_": 28,
"\u0153\u0303": 29,
"\u0254\u026a": 30,
"\u03b2": 31,
"\u028a\u0279": 32,
"a\u028a": 33,
"p\u02d0": 34,
"\u026a": 35,
"\u026a\u0279": 36,
"\u025b\u0303": 37,
"\u0259l": 38,
"\u0292": 39,
"\u0263": 40,
"\u00f8": 41,
"a\u026a\u025a": 42,
"\u025b\u0279": 43,
"\u0254": 44,
"\u0281": 45,
"\u028c": 46,
"u\u02d0": 47,
"\u0259": 48,
"y\u02d0": 49,
"\u0254\u02d0": 50,
"\u0251\u02d0": 51,
"o\u028a": 52,
"o\u02d0\u0279": 53,
"i\u0259": 54,
"\u1d7b": 55,
"t\u0283": 56,
"\u028a": 57,
"a\u026a": 58,
"\u03b8": 59,
"\u025a": 60,
"\u00e6": 61,
"e\u026a": 62,
"\u00f0": 63,
"\u0272": 64,
"\u0261": 65,
"\u025b": 66,
"\u0254\u0303": 67,
"\u014b": 68,
"a\u026a\u0259": 69,
"\u0294": 70,
"n\u0329": 71,
"\u0279": 72,
"\u0251\u02d0\u0279": 73,
"\u0153": 74,
"\u0254\u02d0\u0279": 75,
"\u027e": 76,
"\u0283": 77,
"\u025c\u02d0": 78,
"i\u02d0": 79,
"\u0251\u0303": 80,
"\u029d": 81,
"\u0250": 82,
"\u028e": 83,
"d\u0292": 84,
"k": 85
read code pls
All i see is that you've removed the phonemizer dependency you use english to ipa library. But if i'd like to add languages can I still use phonemizer?
I'm doing this because phonemizer
is under GPL-3.0 Licsense but I'm releasing this as MIT. You can phonemize your new languages in whatever ways you like.