LarryZhangy
LarryZhangy
@jershi425 , Thanks! I had try it use generated data, it can work.
Hi @jershi425 , I use nvcr.io/nvidia/merlin/merlin-training:22.05. And my docker version is 18.09 ,so, docker run --runtime nvidia to start a container. My host use cuda version 11.1 , driver version...
@jershi425 Have you reproduce this error ?
@jershi425 , Thanks! But line 39 is just the same as you describe. And had this bug been fixed at new version?
@jershi425 , when i try to generate keyset for first day data of criteo 1tb, i got OOM error. data:image/s3,"s3://crabby-images/006be/006be9c3611e09d636e34b3491ddfc81185d3a93" alt="image" data:image/s3,"s3://crabby-images/24db1/24db123240201e7b61e97a164d5305e88dfb1d4a" alt="image"
@jershi425 , when this bug can be fixed?
yes, I had the same problem.
> @LarryZhangy Have you resolved this problem? yes, use v0.2.8 and example code from this version.
@sharejing ,我之前弄错了colosaala的版本,官方的tag: 0.2.7的容器镜像里面的colosaalai的版本是0.2.0的,但我试用的代码例子是v0.2.7里面的。更新下容器里面的软件版本就行。因为最新的代码,ColoInitContext在这个位置: from colossalai.utils.model.colo_init_context import ColoInitContext
> 直接用deepseed cli启动 请问deepspeed是怎么知道用哪些node的?需要准备一个ip list文件?