Junlei Zhang
Junlei Zhang
CUDA version: 10.1 pytorch version: 1.4 torchvision version: 0.5 platform: ubuntu Traceback (most recent call last): File "train.py", line 6, in from common.train import * File "/home/westlake/zhangjunlei/code/CSI-master/common/train.py", line 110, in...
Hi, @peteriz seems like there is an issue if deleting the line global_rank = 0. With different worker reading different shard, the total num of iteration for each worker in...
Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether...
Hello, thank you for your code. I tired to run your code with the following commond: aim=pretraining_experiment-bert-mlm--23000 deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 64000 run_pretraining.py \ --model_type bert-mlm --tokenizer_name bert-base-uncased \ --hidden_act gelu...
I think the resnet50 baseline from torchvision (23.85% top-1 error) is trained for 100 epochs instead of 90.
Hi, In the Table 1 in your paper, there is a random search baseline. Are these arch get by random sample architectures? Could you please tell me how do you...
Hello: Thank you for your job. I am interested in your sparse resnet. I am not quite sure how do you implement it. Or you just replace the "concat" operations...
Hello, Could you please answer me a question about your original paper? In your paper, you said "the pitfalls of dense feature aggregation in both densenet and resnet are caused...
 Hello,thank you for your code. I install the environment following your readme. The transformer version is 4.2.1. But I got the above error. Could you please tell me the...