Jiarui Fang（方佳瑞） issues

Results 63 issues of


                                            Jiarui Fang（方佳瑞）

[PROPOSAL]: Gemini Decouples ChunkManager with the Model.

### Proposal ## 动机现在ChunkManager是挂在一个pytorch model中。这样做有限制 1. 无法处理多个model用Gemini训练，以为不同模型对异构内存使用会互相干扰，导致每个模型在warmup采集的信息不具备指导意义。 2. 更重要的是，和Pytorch的使用方式差异。如下optim定义必须传入一个model作为参数。 ``` model = zero_model_wrapper(model, zero_stage, gemini_config) optimizer = zero_optim_wrapper(**model**, optimizer, optim_config=optim_config) ``` 而Pytorch Optimizer初始化和model没有任何关系(https://pytorch.org/docs/stable/optim.html)，尽管大多数使用场景，optimizer构建时使用model.parameters()，但比如下面代码第二种方式目前Gemini就不能支持。 ``` optimizer = optim.SGD(model.parameters(), lr=0.01,...

enhancement

[exmaple] add bert and albert

## 📌 Checklist before creating the PR - [ ] I have created an issue for this PR for traceability - [ ] The title follows the standard format: `[doc/gemini/tensor/...]:...

Details about backward hooks in stage3, why detach outputs?

Dear authors, Thank you for the awesome works. I try to learn some implementation details and come across a small question. I doubt the meaning of the two following lines....

fixed some errors in Makefile for lm preparation

1. install sentencepiece from github repo. I can not run the .zip version on my MacOS. 2. make some necessary directories during make 3. cache the wiki json.gz if has...

add missing files to run cc_net with a given config

In this PR, we can run `python -m cc_net --config config/test_segment.json` successfully in the following directory. data_prep/cc/cc_net/cc_net depends on #36

How to auto parsing?

I run the benchmark.pu with the following warnings. python benchmark.py --arch resnet18 --device cuda:0 Parsing Computation Graph with torch.jit failed, revert to manual parse_graph function

Prepared a demo dataset for GPT performance benchmarking

### 🐛 Describe the bug As a place to show the best practice for users, I believe it is necessary to help users to skip the annoying dataset preparation stage....

Align benchmark with the others

Hello, thanks for the wonderful project. Did you consider aligning the results with some commonly used ones? https://github.com/mlcommons/training https://github.com/Oneflow-Inc/DLPerf

What is the plan to support beam search

I have noticed that LightLLM currently seems to only support decoding through **sampling**. Additional decoding methods such as **BeamSearch** and **GreedySearch** are not yet supported. I would like to know...

runtime error

I fixed the gym error. However, another error occurs. ```` [ERROR:640844 training:471 2022-10-12 11:16:25,954] Exception in worker process 0 Traceback (most recent call last): File "/home/lcfjr/codes/autoshard/autoshard/training.py", line 437, in act...