Jiahao Li

Results 12 issues of Jiahao Li

As @ramiro050 requested in https://github.com/llvm/torch-mlir/pull/1747, this PR moved the shape code for stack op from torch-mlir to pytorch upstream.

open source
release notes: jit

This PR fixes: 1. the actor/critic mean loss calculation 2. step-3 training script for 1.3b model on single gpu 3. some typos

This PR fixes two issues. 1. Ensure weights & biases passed into `LinearLayer` & `LinearAllreduce` to be `torch.nn.Parameter`. This avoids `state_dict()` being empty and `bias.requires_grad` being true after replacing linear...

This PR added an argument `n_ctx` in `ggml_rope` function to support ChatGLM-6B, because the `position_ids` in ChatGLM-style RoPE is related to the context length. The C++ implementation of ChatGLM can...

在 `README` 中添加了 [chatglm.cpp](https://github.com/li-plus/chatglm.cpp) 项目,是类似 llama.cpp 的纯 C++ 实现,支持 int4/int8 量化,可在 MacBook CPU 上实现实时推理。

The `overlap_comm` and `contiguous_gradients` options have been ignored in ZeRO stage 1 since https://github.com/microsoft/DeepSpeed/pull/1246. Back in that time, ZeRO 1 and 2 are separately implemented (see https://github.com/microsoft/DeepSpeed/tree/6ae756c03f12674f17aef90622e7664a8af9d2af/deepspeed/runtime/zero). ZeRO 1 does...

Fix error messages that missed the f prefix.

This PR fixed commands in README and removed `alpaca_data.jsonl` since it is generated from `alpaca_data.json`.

The current `logprobs_of_labels` computes logprobs using a `log_softmax` followed by a `gather`. When the input logits is not contiguous, the `log_softmax` will make a copy of the logits, which is...

### 🐛 Describe the bug Would it be more reasonable to calculate the average loss over the batch dim, instead of over all tokens? Now it seems that sequences of...

bug