metaseq icon indicating copy to clipboard operation
metaseq copied to clipboard

Convert string matches to enum comparison if possible

Open suchenzang opened this issue 2 years ago • 4 comments

We have a lot of places where we do string matching, like:

https://github.com/facebookresearch/metaseq/blob/88ae968e679efbe84a8c246d1177852facfc43a2/metaseq/tasks/streaming_language_modeling.py#L339

or

https://github.com/facebookresearch/metaseq/blob/88ae968e679efbe84a8c246d1177852facfc43a2/metaseq_cli/train.py#L96

which can all be found via: ag "== \"" --py.

Convert these all to comparing against enums if possible.

suchenzang avatar Aug 20 '22 13:08 suchenzang

I'm interested in working on this isssue. Could you kindly provide some more information?

annavordou avatar Aug 23 '22 12:08 annavordou

Sure thing! An example of this would look like how we defined the enum ComputeEnvs: https://github.com/facebookresearch/metaseq/blob/544629e95abb5ef9b76242b402d498df79acde14/metaseq/launcher/opt_job_constants.py#L53-L56 which then gets used in our checkpoint_utils: https://github.com/facebookresearch/metaseq/blob/544629e95abb5ef9b76242b402d498df79acde14/metaseq/checkpoint_utils.py#L261 (instead of seeing cfg.cluster_env == "azure" logic scattered everywhere).

The work here would be to identify cases where we can "group" these string values together into enums, and compare against enum values instead of raw strings.

To start, there's two potential enums to define: splits being either "train" or "valid", along with ddp_backend (which, notably, we have separately defined on input configuration choices https://github.com/facebookresearch/metaseq/blob/544629e95abb5ef9b76242b402d498df79acde14/metaseq/dataclass/constants.py#L38 but all of our config logic will need to slowly be reworked since right now it's quite messy: https://github.com/facebookresearch/metaseq/issues?q=is%3Aissue+is%3Aopen+label%3Aconfig+)

suchenzang avatar Aug 23 '22 13:08 suchenzang

Hi and thank you for your fast response. If i understood it correctly something like this would be what you need right? class ComputeSplits(Enum): VALID = "valid" TRAIN = "train"

annavordou avatar Aug 23 '22 13:08 annavordou

Are changes necessary to many files or just one? If so, if you could point out where I can find them I would be grateful.

annavordou avatar Aug 23 '22 16:08 annavordou