gmryu

Results 61 comments of gmryu

@sanchit-gandhi You have to change `w2v_path` into a model with `load_pretrained_decoder_from=None`, i.e. the NEW_MODEL_PATH? (even it does not exist when assgiend) I believe It is because `w2v_path`=(the bugged w2v2_mbart_LND_w_ASR.pt) causes...

see https://github.com/facebookresearch/fairseq/issues/4563

Sorry I have no knowledge in this field. All I find out is In short: using any lm model, change its `supported_target` method([i.e.](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/roberta/model.py#L355)) to return {"future", "past", "self"}, a LanguageModelingTask,...

It is because the original command is by default meant to be used with a lot of gpus. See the config you used: [fairseq/examples/hubert/config/pretrain/hubert_base_librispeech.yaml 's distributed_training](https://github.com/facebookresearch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L16) It says `distributed_world_size: 32...

I suggest you try a vanilla excution, says 1. no `--restore-file` 2. `--batch-size 1` instead of `--max-tokens` 3. `--arch bart_base` 4. you may need to remove those `--reset-...` arguments and...

`RuntimeError: CUDA out of memory` (OOM) happens in one gpu. So it is not a multi-gpu problem. Allocating memory is necessary because you have to transfer values from your files...

I thought nllb uses a byte-level sentencepiece. Am I wrong? Is the dict you talked about is this https://dl.fbaipublicfiles.com/large_objects/nllb/models/spm_200/dictionary.txt ? Since it is a byte-level dictionary, there is no actual...

Confirmed. The downloaded dictionary.txt does not have all byte chars. So there are actually a lot of words/characters considered as \. I inspected the original dictionary with more logger.info inside...

I would wonder how the authors deal with those unknonw words. It feels like a huge hole and they would not have overlooked this. ---- In my case, I expanded...

Not now, I would believe. Because this new feature "fast path" is applied when `why_not_fast_path==False `( `''` is False) and using [`torch._native_multi_head_attention`](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/activation.py#L1115), which is implemented with C. Fairseq uses [`F.multi_head_attention_forward`](https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/multihead_attention.py#L538)...