Qing Lan

Results 80 comments of Qing Lan

This is also reproducible on GPT-J-6B model if you simply switch it

@jeffra Here is the pip list ``` root@b963730b133d:/# pip3 list Package Version ------------------ ------------ certifi 2022.5.18.1 charset-normalizer 2.0.12 deepspeed 0.6.5 filelock 3.7.1 hjson 3.0.2 huggingface-hub 0.7.0 idna 3.3 ninja 1.10.2.3...

@jeffra just tested NVIDIA V100 GPU series with 4GPU onboard, the code just works fine, which match to what you have tested. So I think we can just narrow down...

Just raised a separate issue here: https://github.com/microsoft/DeepSpeed/issues/2113 Still reproducible on 0.6.7 @jayargo did you managed to fix it by any chance?

Currently we are already using the offical torch 1.12.1 MacOS Arm64 wheel to build the JNI. So it has capabitlity to extend to GPU, but we didn't implement the MPS...

@jeffra Just tested again with single GPU, the error is gone. I also tested multi-gpu and facing some NCCL issue. I think this might leads to the machine is not...

Since we are moving towards the graduation. I would also want to check with everyone if we would like to prmote all of our committers to PMC as we graduate....

Yes there should be a way to achieve that, which DL framework you are looking for? Currently we support MXNet transfer learning and experimental PyTorch Transfer Learning. We have MaskRCNN...

So here is the steps. 1) Get the MaskRCNN pretrained model: http://docs.djl.ai/mxnet/mxnet-model-zoo/index.html, you can choose one of the backbone. Try to get it run with our instance segmentation example http://docs.djl.ai/mxnet/mxnet-model-zoo/index.html....

Do you have any code to share to reproduce your issues?