Ganesh Krishnan
Ganesh Krishnan
I ran this on lower batch count and I can see the trainer never uses more than 1 GPU  ****
I used the example provided and also put accelerator but both of them fails to use more than 1 GPU. Any suggestions?
This is my python code: I experimented with accelerator, then torch distributed and also added to(device). I will try with your method and see if it works out with 4...
 The shell script worked and I got the checkpoint as well with multiple GPUs. Python code didn't use the multiple GPU though.
Thanks for the tip about the w. I am using DataFormat C. eg `{"text": "Cool Spot 11x11 Pop-Up Instant Gazebo Tent with Mosquito Netting Outdoor Canopy Shelter with 121 Square...
Negative is very hard to generate from unlabelled text for DataSet B. We have "product title" -> "search term" as positive correlation but there is no correct way to generate...
I don't mind catastrophic forgetting. I could even train from scratch with the amount of data we have. The learning rate is currently set to 3e-6. It took 8 hours...
I will ask someone from our team to look into it. Right now its easier for me to use this for generating vectors and training a different sentence transformers for...
btw, can my team member reach out on your email to get some support for adding support of angle_emb to sentence-transformers?
any plans for this yet?