Jiahao Li issues

Results 12 issues of


                                            Jiahao Li

Add shape function for stack op

As @ramiro050 requested in https://github.com/llvm/torch-mlir/pull/1747, this PR moved the shape code for stack op from torch-mlir to pytorch upstream.

open source

release notes: jit

Fix RLHF loss metrics & single-gpu training script

This PR fixes: 1. the actor/critic mean loss calculation 2. step-3 training script for 1.3b model on single gpu 3. some typos

Use F.linear for LinearLayer & LinearAllreduce. Ensure Parameters.

This PR fixes two issues. 1. Ensure weights & biases passed into `LinearLayer` & `LinearAllreduce` to be `torch.nn.Parameter`. This avoids `state_dict()` being empty and `bias.requires_grad` being true after replacing linear...

Support ChatGLM-style RoPE

This PR added an argument `n_ctx` in `ggml_rope` function to support ChatGLM-6B, because the `position_ids` in ChatGLM-style RoPE is related to the context length. The C++ implementation of ChatGLM can...

Add chatglm.cpp project link in README

在 `README` 中添加了 [chatglm.cpp](https://github.com/li-plus/chatglm.cpp) 项目，是类似 llama.cpp 的纯 C++ 实现，支持 int4/int8 量化，可在 MacBook CPU 上实现实时推理。

Release overlap_comm & contiguous_gradients restrictions for ZeRO 1

The `overlap_comm` and `contiguous_gradients` options have been ignored in ZeRO stage 1 since https://github.com/microsoft/DeepSpeed/pull/1246. Back in that time, ZeRO 1 and 2 are separately implemented (see https://github.com/microsoft/DeepSpeed/tree/6ae756c03f12674f17aef90622e7664a8af9d2af/deepspeed/runtime/zero). ZeRO 1 does...

Jiahao Li

Add shape function for stack op

Fix RLHF loss metrics & single-gpu training script

Use F.linear for LinearLayer & LinearAllreduce. Ensure Parameters.

Support ChatGLM-style RoPE

Add chatglm.cpp project link in README

Release overlap_comm & contiguous_gradients restrictions for ZeRO 1

Fix f-string messages

Fix commands in README

Faster & memory-efficient logprobs calculation

Loss should be averaged over all samples instead of tokens