LSC527

Results 10 issues of LSC527

code from https://github.com/NVIDIA/DeepLearningExamples/tree/master/MxNet/Classification/RN50v1.5 running with env var MXNET_ENABLE_CUDA_GRAPHS=1 ``` [1,5]: _DaliBaseIterator.__init__(self, [1,5]:2022-02-24 04:23:12,251:WARNING: DALI iterator does not support resetting while epoch is not finished. Ignoring... [1,5]:2022-02-24 04:23:12,251:INFO: Starting epoch 0...

help wanted

It seems that mxnet has supported cuda graph.

enhancement

I implemented this papaer with [torch.autograd.forward_ad](https://pytorch.org/tutorials/intermediate/forward_ad_usage.html). However, fwd gradient showed no speed-up compared to fwd+bwd.

![Dingtalk_20210830173306](https://user-images.githubusercontent.com/34333110/131319083-f28366bf-c3cb-4c4c-a334-9282d146c948.jpg)

### 🐛 Describe the bug File "/workspace/ColossalAI-Examples/image/detr/models/transformer.py", line 10, in from titans.layer.attention import DeTrAttention ImportError: cannot import name 'DeTrAttention' from 'titans.layer.attention' (/opt/conda/lib/python3.8/site-packages/titans/layer/attention/__init__.py) ### Environment _No response_

训练配置如下: ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_node 8 --vllm_num_engines 2 --vllm_tensor_parallel_size 4 --micro_train_batch_size 4 --train_batch_size 64 --micro_rollout_batch_size 4 --rollout_batch_size 64...

enhancement
help wanted

I have some thoughts about using vLLM for generation. Feel free to correct me if I were wrong. 1. Batching It seems that prompts are still passing to vllm engines...

help wanted

### Your current environment ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.4 LTS (x86_64) GCC...

bug

**What's the issue, what's expected?**: Error when using ms-amp to do llm sft. ms-amp deepspeed config: "msamp": { "enabled": true, "opt_level": "O1|O2|O3", # all tried "use_te": false } **How to...

which one should we use for generate?