caixiiaoyang

Results 18 comments of caixiiaoyang

> """要使用调试信息进行构建,请添加标志 --bazel_options='--copt=/Z7'。""" FROM https://jax.readthedocs.io/en/latest/developer.html#id1 不过这个是基于最新的openxla的回答了,这个不确定0.3.22的能不能用 btw,可以加个vx交流一下吗老哥 可以,我的微信号是13140163867

> I also met this issue when trying to use `alpa.ShardParallel()` or `alpa.PipeshardParallel()` to auto parallelize my llama model. > > ![image](https://private-user-images.githubusercontent.com/8370601/286101881-14749d05-9b5c-4f52-ab3b-7566908f3b7a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDE5MTQwNzUsIm5iZiI6MTcwMTkxMzc3NSwicGF0aCI6Ii84MzcwNjAxLzI4NjEwMTg4MS0xNDc0OWQwNS05YjVjLTRmNTItYWIzYi03NTY2OTA4ZjNiN2EucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQUlXTkpZQVg0Q1NWRUg1M0ElMkYyMDIzMTIwNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyMzEyMDdUMDE0OTM1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MTk1NDU1OTllMjAyYmI3N2RlZmMzYzNjMDM3MWUyZWJmNjFhZmJjZDZjZTc4NzgwOTQ5YjY1NTJmMDMyZWJjOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.VucGPq6WLTOZO7S9U_N866Pfr4690sPXRNFd6IABzzQ) I also encountered this problem in the process...

I encountered the same problem, has your problem been solved?

请问您的问题解决了吗我的也出现了这个问题

你有几张卡,num_stages默认是2,你的num_devices不能整除num_satges就会出现这个问题,最好保证num_devices为偶数

> > 你有几张卡,num_stages默认是2,你的num_devices不能整除num_satges就会出现这个问题,最好保证num_devices为偶数 > > 我的服务器上有5张卡,如下图 ![image](https://user-images.githubusercontent.com/48151874/276118962-d4777b10-e090-4c5a-961b-1f9db404343e.png) > > 我应该怎么修改num_devices的数量,当我尝试输入CUDA_VISIBLE_DEVICES=0,1 python3 -m alpa.test_install,还是会有相同的问题 我不知道怎么修改卡的数目,你有五张卡,并且卡的型号也不一样,你在a100上编译的,在3090上运行可能会出问题

请问您解决了吗,我也出现了这个错误

> i can not train yolov5 with your code ,report error ' ValueError: Target size (torch.Size([16, 1200, 1])) must be the same as input size (torch.Size([16, 17000, 1]))',it seems that...