LSC527 issues

Results 10 issues of


                                            LSC527

dali cuda error when running NVDeepLearningExamples with MXNET_ENABLE_CUDA_GRAPHS=1

code from https://github.com/NVIDIA/DeepLearningExamples/tree/master/MxNet/Classification/RN50v1.5 running with env var MXNET_ENABLE_CUDA_GRAPHS=1 ``` [1,5]: _DaliBaseIterator.__init__(self, [1,5]:2022-02-24 04:23:12,251:WARNING: DALI iterator does not support resetting while epoch is not finished. Ignoring... [1,5]:2022-02-24 04:23:12,251:INFO: Starting epoch 0...

help wanted

Enable cuda graph in mxnet example

It seems that mxnet has supported cuda graph.

enhancement

No speed-up in my implementation too

I implemented this papaer with [torch.autograd.forward_ad](https://pytorch.org/tutorials/intermediate/forward_ad_usage.html). However, fwd gradient showed no speed-up compared to fwd+bwd.

在kubeflow dashboard中创建jupyter notebook无法找到GPU

![Dingtalk_20210830173306](https://user-images.githubusercontent.com/34333110/131319083-f28366bf-c3cb-4c4c-a334-9282d146c948.jpg)

ImportError running detr

### 🐛 Describe the bug File "/workspace/ColossalAI-Examples/image/detr/models/transformer.py", line 10, in from titans.layer.attention import DeTrAttention ImportError: cannot import name 'DeTrAttention' from 'titans.layer.attention' (/opt/conda/lib/python3.8/site-packages/titans/layer/attention/__init__.py) ### Environment _No response_

Unexpected long actor_time when train_ppo_ray

训练配置如下： ``` --ref_num_nodes 1 --ref_num_gpus_per_node 2 --reward_num_nodes 1 --reward_num_gpus_per_node 2 --critic_num_nodes 1 --critic_num_gpus_per_node 4 --actor_num_nodes 2 --actor_num_gpus_per_node 8 --vllm_num_engines 2 --vllm_tensor_parallel_size 4 --micro_train_batch_size 4 --train_batch_size 64 --micro_rollout_batch_size 4 --rollout_batch_size 64...

enhancement

help wanted

LSC527

dali cuda error when running NVDeepLearningExamples with MXNET_ENABLE_CUDA_GRAPHS=1

Enable cuda graph in mxnet example

No speed-up in my implementation too

在kubeflow dashboard中创建jupyter notebook无法找到GPU

ImportError running detr

Unexpected long actor_time when train_ppo_ray

About using vLLM for generation

[Bug]: RuntimeError: "cat_cuda" not implemented for 'Float8_e4m3fn'

AttributeError: 'ScalingTensor' object has no attribute 'view'

what's the difference between ea_generate and eagenerate