PENG Bo issues

Results 30 issues of


                                            PENG Bo

[REQUEST] When training a FP16 model, the ability to set some of the layers to FP32

When training a FP16 model, I wonder if it's possible to set some of the layers to FP32. I can add .to(torch.float16) and .to(torch.float32) to do the conversion between layers....

enhancement

Bad performance when there are lots of optim_groups (for example, using layer-wise learning rate)

DeepSpeed is much slower when there are lots of optim_groups in FusedAdam. For example, when you are using layer-wise learning rate for a model with 100+ layers. In that case,...

bug

asm.js and weblas performance for example_policy_network

On MacBook Pro 15-inch Late 2013 (i7 2.3G, GT 750M), Firefox 58: Let me know your results :-)

minGPT-tuned : some tricks and tweaks to improve model performance

Thanks for the great work. I played around minGPT and found some tricks that might increase model performance, while keeping the number of operations and parameters mostly unchanged. [https://github.com/BlinkDL/minGPT-tuned](https://github.com/BlinkDL/minGPT-tuned) Let...

NaN with mock data

Hi lucidrains, Try this and it will NaN within 100 steps (latest Github code). The loss looks fine before NaN. ``` import torch torch.backends.cudnn.allow_tf32 = True torch.backends.cuda.matmul.allow_tf32 = True torch.backends.cudnn.benchmark...

Sharing the 1.3B-Pile@300B model

The 1.3B-Pile@300B model is quite strong: https://docs.google.com/spreadsheets/d/1CI8Q9RCblLRzUOPJ6ViqBmo284-8ojluQ-CmaEuhuv0/edit#gid=1295801165 lambada 0.6088 piqa 0.7160 hellaswag 0.5209 --> these are all better than gpt-neo 1.3B. Could you share the model? Thank you.

Supporting RWKV (a RNN that can match transformer LM & zero-shot performance at 1B+ params)

Hi guys. I am working on RWKV, which might be the only RNN (no attention!) that can match transformer LM & zero-shot performance at 1B+ params: https://www.reddit.com/r/MachineLearning/comments/vzr6ie/r_rwkv3_scaling_rnn_to_15b_and_reach_transformer/ I am using...

[REQUEST] An option to only save the model state_dict when save_checkpoint(), and how to manually save & load the model state_dict when using ZERO3

In my training code, I only save & load the model state_dict (no optimizer states). I find this is good enough with a few steps of warmup, and saves lots...

enhancement

【建议】FP16/BF16 的 restrict 模式：只将 matmul 在 FP16/BF16 执行，其它算子仍然在 FP32 执行

目前 FP16/BF16 的主要目的是加速，通常只有 matmul 是加速最明显的。如果将太多算子在 FP16/BF16 执行，计算图中容易有大量 f2h h2f，反而变慢。因此建议加个 restrict 模式，只将 matmul 在 FP16/BF16 执行，其它算子仍然在 FP32 执行。这样，只需要在 matmul 头尾加上 f2h h2f，其它地方都不需要 f2h h2f。

Python 3.8 compatibility for api/models.py

### Is there an existing issue for this? - [X] I have searched the existing issues and checked the recent builds/commits ### What happened? api/models.py will produce error under python...

bug-report