VALL-E-X icon indicating copy to clipboard operation
VALL-E-X copied to clipboard

Phonemizer as dependency

Open constan1 opened this issue 1 year ago • 3 comments

Is phonemicized dependency used? How is text turned into phonemes? Phonemizer has alot of phonemes that are grouped together and not standard but still IPA. for example the bpe_json file is

   "vocab": {
        "[UNK]": 0,
        "[CLS]": 1,
        "[SEP]": 2,
        "[PAD]": 3,
        "[MASK]": 4,
        "a": 5,
        "b": 6,
        "d": 7,
        "e": 8,
        "f": 9,
        "g": 10,
        "h": 11,
        "i": 12,
        "j": 13,
        "l": 14,
        "m": 15,
        "n": 16,
        "o": 17,
        "p": 18,
        "s": 19,
        "t": 20,
        "u": 21,
        "v": 22,
        "w": 23,
        "x": 24,
        "y": 25,
        "z": 26,
        "~": 27,
        "_": 28,
        "\u0153\u0303": 29,
        "\u0254\u026a": 30,
        "\u03b2": 31,
        "\u028a\u0279": 32,
        "a\u028a": 33,
        "p\u02d0": 34,
        "\u026a": 35,
        "\u026a\u0279": 36,
        "\u025b\u0303": 37,
        "\u0259l": 38,
        "\u0292": 39,
        "\u0263": 40,
        "\u00f8": 41,
        "a\u026a\u025a": 42,
        "\u025b\u0279": 43,
        "\u0254": 44,
        "\u0281": 45,
        "\u028c": 46,
        "u\u02d0": 47,
        "\u0259": 48,
        "y\u02d0": 49,
        "\u0254\u02d0": 50,
        "\u0251\u02d0": 51,
        "o\u028a": 52,
        "o\u02d0\u0279": 53,
        "i\u0259": 54,
        "\u1d7b": 55,
        "t\u0283": 56,
        "\u028a": 57,
        "a\u026a": 58,
        "\u03b8": 59,
        "\u025a": 60,
        "\u00e6": 61,
        "e\u026a": 62,
        "\u00f0": 63,
        "\u0272": 64,
        "\u0261": 65,
        "\u025b": 66,
        "\u0254\u0303": 67,
        "\u014b": 68,
        "a\u026a\u0259": 69,
        "\u0294": 70,
        "n\u0329": 71,
        "\u0279": 72,
        "\u0251\u02d0\u0279": 73,
        "\u0153": 74,
        "\u0254\u02d0\u0279": 75,
        "\u027e": 76,
        "\u0283": 77,
        "\u025c\u02d0": 78,
        "i\u02d0": 79,
        "\u0251\u0303": 80,
        "\u029d": 81,
        "\u0250": 82,
        "\u028e": 83,
        "d\u0292": 84,
        "k": 85
        
        

constan1 avatar Oct 16 '23 13:10 constan1

read code pls

Plachtaa avatar Oct 16 '23 13:10 Plachtaa

All i see is that you've removed the phonemizer dependency you use english to ipa library. But if i'd like to add languages can I still use phonemizer?

constan1 avatar Oct 16 '23 13:10 constan1

I'm doing this because phonemizer is under GPL-3.0 Licsense but I'm releasing this as MIT. You can phonemize your new languages in whatever ways you like.

Plachtaa avatar Oct 16 '23 13:10 Plachtaa