Jack Chen comments

Results 25 comments of


                                            Jack Chen

Failed to train when using multi-GPU

Maybe it's not caused by using multi-GPU, you can use cuda-memcheck tool to help you find out more details about this error: using cuda-memcheck python your-program.py, it will log more...

How to implement in vision transformer?

any updates on this? Missing your vit example~ @Taka152 @godweiyang

cuda, torch version

I use gcc7.2, torch 1.8, cuda 11.2. Hope it helps

longer sequence length

max sequence length: 8836, having patched these lines of code and it seems works: in ` void launch_attn_softmax_bw `: ```cuda } else if (to_len

> Yes, that's the place to modify the length limit, and it can be tested [here](https://github.com/bytedance/lightseq/blob/aabce486f34bec28bfe0efbbda1a183d5a6a37ba/tests/test_ls_kernels.py#L729-L730). Thanks for pointing that. I will fire a pull request to support longer sequence.

longer sequence length

> @Jack47 It's great. Did you test it works well? The code have some bugs about block_dims. should use test_ls_op.py to validate before use it.

希望能够保持特定层的 weight 仍为 float32

> 可以在一部分结构上用 `torch_scope` 这个接口包一下，在 torch scope 里面的部分会用使用 fp32 进行训练，例如 moe 的例子里： > > https://github.com/Tencent/PatrickStar/blob/0731c6ed2065e62d0cd489813b4e162880a5ab51/examples/moe/moe_bert.py#L53-L64 > > 不过注意，如果只是要把一层设置为 fp32 的话，这里的 `do_allreduce` 应该设置为 `True` 妙啊，意思是这块是torch在管理的，不需要ps参与？

Jack Chen

Failed to train when using multi-GPU

How to implement in vision transformer?

cuda, torch version

longer sequence length

longer sequence length

longer sequence length

希望能够保持特定层的 weight 仍为 float32

希望能够保持特定层的 weight 仍为 float32

Can MONeT use for custom models and higher PyTorch version?

How to run the example of vit?