Todor Mihaylov comments

Results 4 comments of


                                            Todor Mihaylov

Model_parallel=2 and 2 gpus on FAIR cluster: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking arugment for argument index in method wrapper_index_select)

> can you specify --max-tokens? It does not help! Same error at inference.. ``` nb_few_shot_samples=0 expected_max_tgt_len=285, max_positions=1024 Average number of train samples: 0.00 Predicting 56 samples with 168 prompts.. Before...

Model_parallel=2 and 2 gpus on FAIR cluster: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking arugment for argument index in method wrapper_index_select)

Model config: ``` "1.3B_gptz_model_parallel": gptz_sharded_config( "/large_experiments/xlmg/models/1.3B_gptz_from_azure/1.3B/checkpoint_last.pt", model_parallel_size=8 ), ``` Alloc: ``` srun --gpus=8 --nodes 1 --ntasks-per-node 1 --cpus-per-task 10 --mem-per-gpu 58G \ --constraint volta32gb --time 1440 --partition xlmg,devaccel,learnaccel --pty bash...

Model_parallel=2 and 2 gpus on FAIR cluster: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:1! (when checking arugment for argument index in method wrapper_index_select)

The model seems to pass when set the model_parallel to 2 and gpus=8: Model setting - model_parallel is 2 ``` "1.3B_gptz_model_parallel": gptz_sharded_config( "/large_experiments/xlmg/models/1.3B_gptz_from_azure/1.3B/checkpoint_last.pt", model_parallel_size=8 ), ``` Allocation: 8 gpus: ```...

support Overleaf v2

I am trying to adapt it. The old API is still active. But they changed the strategy for assigning cookies and I am still figuring it out. @ArcturusB feel free...