Zhang Junda
Zhang Junda
Thanks for your answer! I'll have a try
Hi, I tried what you suggested in this way: Old code: ```python3 for b in range(bs): neg_idx = torch.multinomial(weights_t2i[b], 1).item() image_embeds_neg.append(image_embeds[neg_idx]) ``` Current code: ```python3 for b in range(bs): nan_idx...
Sorry for the late reply, it's little sensitive in our data... I'll try to squeeze time to experiment in some public datasets and let you know...
In the Eurosys 20, a paper named "Balancing Efficiency and Fairness in Heterogeneous GPU Clusters for Deep Learning" said they implemented this by using CRIU.
> Indeed, thanks for pointing out that paper. > > I just had a look and they write that they do not checkpoint the GPU part only the CPU part....
> > > Indeed, thanks for pointing out that paper. > > > I just had a look and they write that they do not checkpoint the GPU part only...