Susan Zhang
Susan Zhang
We have a lot of places where we do string matching, like: https://github.com/facebookresearch/metaseq/blob/88ae968e679efbe84a8c246d1177852facfc43a2/metaseq/tasks/streaming_language_modeling.py#L339 or https://github.com/facebookresearch/metaseq/blob/88ae968e679efbe84a8c246d1177852facfc43a2/metaseq_cli/train.py#L96 which can all be found via: `ag "== \"" --py`. Convert these all to comparing...
Before we can move to cleaner configs, we need to reduce the amount of args bloat in the codebase. This is an issue to track cleanup efforts on this front....
https://github.com/facebookresearch/metaseq/pull/349 * Request to swap out the binary files for something text-based / diff-able from @stephenroller * Questions around needing criterion & specify_arch flags from @lilisierrayu * Add test for...
API codepaths sometimes break with changes since they are not as frequently tested as our training codepaths. This has come up as an important need given ongoing efforts to clean...
The Megatron codebase has timers scattered all over portions of their code (i.e. https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/optimizer/optimizer.py#L412). We should add similar timers to see if we can find areas of improvement. * Per...
By the time we get to our first `convert_namespace_to_omegaconf` call, we already have: ``` # from hydra.core.config_store import ConfigStore (Pdb) cs = ConfigStore.instance() (Pdb) cs.repo.keys() dict_keys(['hydra', '_dummy_empty_config_.yaml', 'base_config.yaml', '_name.yaml', 'common.yaml',...
From looking at the `merge_with_parent` method, we see: ``` (base) √ fairseq-big-internal % ag merge_with_parent fairseq/tasks/__init__.py 12:from fairseq.dataclass.utils import merge_with_parent, populate_dataclass 39: cfg = merge_with_parent(dc(), cfg) fairseq/registry.py 10:from fairseq.dataclass.utils import...
There is too much junk lobbed together in the top level utils.py file. Clean this up.
Given https://github.com/fairinternal/metaseq-internal/pull/181 , it seems like arch is not necessarily present in args in the training workflow when loading from disk. This seems like yet another case where there are...
We are currently not following any release strategy, which is not great for creating any kind of open source ecosystem. There seems to be nonzero number of external users now,...