gmryu comments

Results 66 comments of


                                            gmryu

Loading Wav2Vec2-2-mBart Checkpoints for S2UT

@sanchit-gandhi You have to change `w2v_path` into a model with `load_pretrained_decoder_from=None`, i.e. the NEW_MODEL_PATH? (even it does not exist when assgiend) I believe It is because `w2v_path`=(the bugged w2v2_mbart_LND_w_ASR.pt) causes...

How to use different kinds of optimizers in one task?

see https://github.com/facebookresearch/fairseq/issues/4563

How to use the past, future and self targets for text generation?

Sorry I have no knowledge in this field. All I find out is In short: using any lm model, change its `supported_target` method([i.e.](https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/roberta/model.py#L355)) to return {"future", "past", "self"}, a LanguageModelingTask,...

A runtime error occurred while running Hubert.Default process group has not been initialized, please make sure to call init_process_group.

It is because the original command is by default meant to be used with a lot of gpus. See the config you used: [fairseq/examples/hubert/config/pretrain/hubert_base_librispeech.yaml 's distributed_training](https://github.com/facebookresearch/fairseq/blob/main/examples/hubert/config/pretrain/hubert_base_librispeech.yaml#L16) It says `distributed_world_size: 32...

How to use multi-GPU

I suggest you try a vanilla excution, says 1. no `--restore-file` 2. `--batch-size 1` instead of `--max-tokens` 3. `--arch bart_base` 4. you may need to remove those `--reset-...` arguments and...

How to use multi-GPU

`RuntimeError: CUDA out of memory` (OOM) happens in one gpu. So it is not a multi-gpu problem. Allocating memory is necessary because you have to transfer values from your files...

NLLB vocabulary missing common Chinese character/tokens

I thought nllb uses a byte-level sentencepiece. Am I wrong? Is the dict you talked about is this https://dl.fbaipublicfiles.com/large_objects/nllb/models/spm_200/dictionary.txt ? Since it is a byte-level dictionary, there is no actual...

NLLB vocabulary missing common Chinese character/tokens

Confirmed. The downloaded dictionary.txt does not have all byte chars. So there are actually a lot of words/characters considered as \. I inspected the original dictionary with more logger.info inside...

NLLB vocabulary missing common Chinese character/tokens

I would wonder how the authors deal with those unknonw words. It feels like a huge hole and they would not have overlooked this. ---- In my case, I expanded...

torch 1.12 improve fairseq TransformerLayer ?

Not now, I would believe. Because this new feature "fast path" is applied when `why_not_fast_path==False `( `''` is False) and using [`torch._native_multi_head_attention`](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/activation.py#L1115), which is implemented with C. Fairseq uses [`F.multi_head_attention_forward`](https://github.com/facebookresearch/fairseq/blob/main/fairseq/modules/multihead_attention.py#L538)...