Omar Elayan
Omar Elayan
When running in inference, one of the input parameters is a sentence which is given after --prompt flag. As all arguments are parsed inside inference to build the "shell" command,...
This PR mainly handles all places where InferenceBuilder is used to access any op or a specific implementation for an op. Instead an op is defined, and its proper implementation...
https://github.com/microsoft/DeepSpeed/blob/3dd7ccff8103be60c31d963dd2278d43abb68fd1/deepspeed/module_inject/tp_shard.py#L35 [This change](https://github.com/microsoft/DeepSpeed/pull/4697) introduces a new way for AutoTP to handle work when split_shape isn't divisible by num_kv_heads. This was of sharding is done in mlp, lm_head, and embed_out as...