Omar Elayan

Results 4 issues of Omar Elayan

When running in inference, one of the input parameters is a sentence which is given after --prompt flag. As all arguments are parsed inside inference to build the "shell" command,...

bug
inference

This PR mainly handles all places where InferenceBuilder is used to access any op or a specific implementation for an op. Instead an op is defined, and its proper implementation...

https://github.com/microsoft/DeepSpeed/blob/3dd7ccff8103be60c31d963dd2278d43abb68fd1/deepspeed/module_inject/tp_shard.py#L35 [This change](https://github.com/microsoft/DeepSpeed/pull/4697) introduces a new way for AutoTP to handle work when split_shape isn't divisible by num_kv_heads. This was of sharding is done in mlp, lm_head, and embed_out as...