ruanslv
ruanslv
@ngoyal2707 would you be ok with a merge here too? Your metaseq PR (https://github.com/facebookresearch/metaseq/pull/300/commits) as a whole makes it possible to use `fairseq_v3`, but the Namespace change alone is not...
> It seems LLama3 is using “right” padding and using “eos_token“ as the “padding_token”. Please do not mix up this line of inference code with any kind of training setup:...
See https://github.com/meta-llama/llama3/blob/main/llama/tokenizer.py#L222
> could one use some of these tokens in finetunes (instead of adding additional tokens and resizing the vocabulary) Yes you can, this is why they were added -- to...
I did a bisect, this is commit that started causing the error: https://github.com/facebookresearch/metaseq/commit/493e6017c18f7c2d3cd697693e6f9e33592f3612 cc @lilisierrayu
After commenting out line suggested, second error is caused by this commit in particular https://github.com/facebookresearch/metaseq/commit/c4b33ba6e2cd9b33539bbb5a35d831096bde3282
Ok did a bit of digging with @suchenzang, here is the summary: - Indeed `setattr(cfg["model"], "inference", True)` from https://github.com/facebookresearch/metaseq/commit/493e6017c18f7c2d3cd697693e6f9e33592f3612 is a bug, figuring out best way to fix it and...
@andchir we haven't retrained the 350M model yet but if locally you set `self.layer_norm = None` in metaseq/models/transformer_decoder.py it should work
Test breakage is unrelated, it was already present before: https://app.circleci.com/pipelines/github/facebookresearch/metaseq/1223/workflows/06606d26-917b-422c-8717-0c316ec449a4/jobs/1714 As for pinning a version, isn't it better not to pin if we can so we can automatically get gains/advantages...
Actual crash fixed in https://github.com/facebookresearch/metaseq/pull/571. Pinned the current release and updated PR title