metaseq
metaseq copied to clipboard
Remove Megatron dependency - move entirely to Fairscale
This is to look into whether or not we can remove our Megatron dependency and rely entirely on our Fairscale dependency (model parallelism implementation seems to be identical between the two).