api-inference-community
api-inference-community copied to clipboard
update fairseq version
We created a hub model here https://huggingface.co/facebook/xm_transformer_s2ut_800m-en-hk-h1_2022 to support our English to Hokkein translation model and also pushed some changes to fairseq to get the model working. Updating the HF version of fairseq to test this model. Thanks so much in advance!
I changed this line to the new model and the test fails https://github.com/huggingface/api-inference-community/blob/main/docker_images/fairseq/tests/test_api.py#L12. The inference code does not work for this model. My understanding from https://huggingface.co/facebook/xm_transformer_s2ut_800m-es-en-st-asr-bt_h1_2022 is that the inference is more involved for this model, requiring to use fastspeech as well.
Actually I misread the code. Now I realize this is a ASR Model, for which we don't have support in fairseq
in the API. I'll add asr
, but the model is almost 10Gb, which means it will be very slow to load.
Hello, I'm sorry @sravyapopuri388 this may seem like a misplaced problem , but I asked because you are here on the same model, I am trying to convert from english to english using this form, When experimenting with https://huggingface.co/facebook/xm_transformer_s2ut_800m-es-en-st-asr-bt_h1_2022 It works, but when I use the example attached below it converts the English language into Spanish,
models, cfg, task = load_model_ensemble_and_task_from_hf_hub(
"facebook/xm_transformer_s2ut_800m-es-en-st-asr-bt_h1_2022",
arg_overrides={"config_yaml": "config.yaml", "task": "speech_to_text"},
cache_dir=cache_dir,
)
model = models[0].cpu()
cfg["task"].cpu = True
generator = task.build_generator([model], cfg)
# requires 16000Hz mono channel audio
audio, _ = torchaudio.load("../gnz_10005_m3.wav")
sample = S2THubInterface.get_model_input(task, audio)
unit = S2THubInterface.get_prediction(task, model, generator, sample)
# speech synthesis
library_name = "fairseq"
cache_dir = (
cache_dir or (Path.home() / ".cache" / library_name).as_posix()
)
cache_dir = snapshot_download(
f"facebook/unit_hifigan_mhubert_vp_en_es_fr_it3_400k_layer11_km1000_lj_dur", cache_dir=cache_dir, library_name=library_name
)
x = hub_utils.from_pretrained(
cache_dir,
"model.pt",
".",
archive_map=CodeHiFiGANVocoder.hub_models(),
config_yaml="config.json",
fp16=False,
is_vocoder=True,
)
with open(f"{x['args']['data']}/config.json") as f:
vocoder_cfg = json.load(f)
assert (
len(x["args"]["model_path"]) == 1
), "Too many vocoder models in the input"
vocoder = CodeHiFiGANVocoder(x["args"]["model_path"][0], vocoder_cfg)
tts_model = VocoderHubInterface(vocoder_cfg, vocoder)
tts_sample = tts_model.get_model_input(unit)
wav, sr = tts_model.get_prediction(tts_sample)
ipd.Audio(wav, rate=sr)
Is there a parameter that needs to be modified, and how? The goal, in short, is to convert broken English into proper English using this model.
thanks