MAxx8371
MAxx8371
What causes this error?Is the bin_size of a categorical feature bigger than the max_bin that causes the error? Or it is because the memory is not enough. And the model...
全量finetune,ZeRO3,设置output_router_logits=True。训练过程中会突然卡住,GPU利用率突然到100% 
> Fixed on master I installed pytorch by running "pip install torch" and you had said "Fixed on master" in github , would you please explain how to update it...
> I have a similar problem. My cluster has a relatively slow shared storage system, so I want to copy dataset to compute node temporary storage system. However, I found...