Zhao Changmin
Zhao Changmin
I see, I'm working on it.
> You may use the data: hdfs://172.16.0.105:8020/user/root/jwang/wnd_twitter_2 I will take a look. Since it's a private repo, would you mind giving me permission?
> > > You may use the data: hdfs://172.16.0.105:8020/user/root/jwang/wnd_twitter_2 > > > > > > I will take a look. Since it's a private repo, would you mind giving me...
`tensorflow.python.framework.errors_impl.InvalidArgumentError: Received a label value of 1 which is outside the valid range of [0, 1). Label values: 1 1 0 1 1 1` do I miss something doing with...
Change loss function to binary_crossentropy, and I didn't make it to reproduce this issue. ` 9/13632 [..............................] - ETA: 5:47:25 - loss: 0.6880 - accuracy: 0.5741`
``` (pid=157408, ip=172.16.0.146) Global rank: 6 (pid=157408, ip=172.16.0.146) Total workers: 8 (pid=157408, ip=172.16.0.146) Number of files for worker: 8 (pid=157408, ip=172.16.0.146) Data size for worker: 671325 (pid=157408, ip=172.16.0.146) Loading hdfs://172.16.0.105:8020/user/root/jwang/wnd_twitter_2/train_parquet/part-00042-cbd17f77-8da4-45c7-9031-919a6d619098-c000.snappy.parquet...
Sorry about late, I will reproduce the installation process.
``` # python 37 required for pyarrow conda install -y cmake==3.16.0 -c conda-forge conda install cxx-compiler==1.0 -c conda-forge conda install openmpi conda install tensorflow==2.3.0 HOROVOD_WITH_TENSORFLOW=1;HOROVOD_WITH_GLOO=1; pip install --no-cache-dir horovod pip...
> > ``` > > # python 37 required for pyarrow > > > > conda install -y pytorch torchvision cpuonly -c pytorch > > conda install -y cmake==3.16.0 -c...
hi, @lin1061991611 Maybe your extensions are complied with a not corresponding version of cuda or cudnn? I've met this issue with a tensorflow model once.