LarryZhangy

Results 10 comments of LarryZhangy

@jershi425 , Thanks! I had try it use generated data, it can work.

Hi @jershi425 , I use nvcr.io/nvidia/merlin/merlin-training:22.05. And my docker version is 18.09 ,so, docker run --runtime nvidia to start a container. My host use cuda version 11.1 , driver version...

@jershi425 Have you reproduce this error ?

@jershi425 , Thanks! But line 39 is just the same as you describe. And had this bug been fixed at new version?

@jershi425 , when i try to generate keyset for first day data of criteo 1tb, i got OOM error. ![image](https://user-images.githubusercontent.com/1331144/182280567-0f12feea-b1ae-4276-b092-a43145411858.png) ![image](https://user-images.githubusercontent.com/1331144/182280858-2b9ac749-4050-4d8e-8616-7e73d5afd1a3.png)

@jershi425 , when this bug can be fixed?

> @LarryZhangy Have you resolved this problem? yes, use v0.2.8 and example code from this version.

@sharejing ,我之前弄错了colosaala的版本,官方的tag: 0.2.7的容器镜像里面的colosaalai的版本是0.2.0的,但我试用的代码例子是v0.2.7里面的。更新下容器里面的软件版本就行。因为最新的代码,ColoInitContext在这个位置: from colossalai.utils.model.colo_init_context import ColoInitContext

> 直接用deepseed cli启动 请问deepspeed是怎么知道用哪些node的?需要准备一个ip list文件?