Mayank Mishra issues

Results 18 issues of


                                            Mayank Mishra

use new methods for communication

drop redundant check

This check is already done by the above condition. Can we drop this @deepakn94 ?

[Icon Request]: Add icon for safetensors

### Housekeeping - [X] I'm sure this issue is _not_ a duplicate ### Icon Type - [X] File Icon - [ ] Folder Icon ### Icon Type - [X] Extension...

Unsharding scripts for megablocks models

The base Megatron-LM repo provides unsharding scripts for the models which can be used after training. I didn't find any such scripts in the repo. Would it be possible to...

Granite language models [WIP]

# What does this PR do? This PR adds support for IBM's upcoming LLMs 3B and 8B. - text models: at-ArthurZucker and at-younesbelkada

This PR is a replacement for https://github.com/pytorch/pytorch/pull/133085 for pushing a quick fix for RMSNorm. The original author is @kkontny Previous PR summary: Since FP16 has quite small dynamic range it...

oncall: distributed

module: cpu

triaged

module: mkldnn

open source

ciflow/trunk

release notes: quantization

topic: not user facing

module: inductor

module: dynamo

inputs_ids cast to fp16 in deeperspeed bug

```json { "pipe-parallel-size": 1, "model-parallel-size": 1, "num-layers": 16, "hidden-size": 2048, "num-attention-heads": 8, "seq-length": 2048, "max-position-embeddings": 2048, "pos-emb": "rotary", "rotary-pct": 0.25, "no-weight-tying": true, "gpt-j-residual": true, "output-layer-parallelism": "column", "scaled-upper-triang-masked-softmax-fusion": false, "bias-gelu-fusion": false,...

cleanup code in tile iterator example

as title suggests