Mayank Mishra

Results 18 issues of Mayank Mishra

This check is already done by the above condition. Can we drop this @deepakn94 ?

### Housekeeping - [X] I'm sure this issue is _not_ a duplicate ### Icon Type - [X] File Icon - [ ] Folder Icon ### Icon Type - [X] Extension...

The base Megatron-LM repo provides unsharding scripts for the models which can be used after training. I didn't find any such scripts in the repo. Would it be possible to...

# What does this PR do? This PR adds support for IBM's upcoming LLMs 3B and 8B. - text models: at-ArthurZucker and at-younesbelkada

This PR is a replacement for https://github.com/pytorch/pytorch/pull/133085 for pushing a quick fix for RMSNorm. The original author is @kkontny Previous PR summary: Since FP16 has quite small dynamic range it...

oncall: distributed
module: cpu
triaged
module: mkldnn
open source
ciflow/trunk
release notes: quantization
topic: not user facing
module: inductor
module: dynamo

```json { "pipe-parallel-size": 1, "model-parallel-size": 1, "num-layers": 16, "hidden-size": 2048, "num-attention-heads": 8, "seq-length": 2048, "max-position-embeddings": 2048, "pos-emb": "rotary", "rotary-pct": 0.25, "no-weight-tying": true, "gpt-j-residual": true, "output-layer-parallelism": "column", "scaled-upper-triang-masked-softmax-fusion": false, "bias-gelu-fusion": false,...