Not setting random state for torch in dataset _add_frame_to_target
Hi, in _add_frame_to_target function in dataset file (mot.py), the current code only set random state with random.setstate(random_state), but not for torch. And it cause the sample images in the same batch to be cropped potentially from different areas of the original image, which is not the desired behavior I guess? Since drastic drift in training image pairs can contaminate the training data and teach the model the wrong thing.
By changing random_state = random.getstate() in getitem to random_state = np.random.randint(some number) random.seed(random_state) torch.manual_seed(random_state) and changing random.setstate(random_state) to random.seed(random_state) torch.manual_seed(random_state) the drifting in image pairs disappears, and I was able to improve the MOTA and IDF1 score by 2~3 when training on resolution 600 (compare to the original code). Can you please try this on full resolution and see if can also boost the performance? Thanks!
Hello, in my version there is no drifting between image pairs observable in the visdom visualization. What torchvision version are you using? I am only setting the state of the python random package as all data augmentations rely on this package and not on torch.random. However, more recent torchvision versions introduced some changes. For example to the RandomCrop.get_params method which now uses torch.randint instead of random.randint. If you are working with a more recent version than recommended in the README it might be required to adapt the code and also set the torchvision random state.
Oh I was using a later torch vision version. However, when I switch to torch vision 0.6 and torch 1.5 (as required) I still have reproducibility issue. I used distributed training and the eval outcome is different between different runs. Do you know what might be the cause of this problem? I see that in train.py you've set random seed for torch and random so I'm confused why the code produces different outcomes. thanks
How different is the outcome? In distributed training mode the trainings are not perfectly deterministic. Try running it on a single GPU to check if it is deterministic then.
Hi I've tried running with one GPU with required version of torch/torchvision, only made resolution modification (1920 to 600) because memory limit. However the result is still different between individual runs. The difference is not large. But I still would like to know whether the code is deterministic since I'm doing some ablation study and having reproducibility would be great. When running with one GPU did you have deterministic behavior? Thanks!
During testing the code is deterministic but not during training. There is a small drift in loss values. The Deformable DETR codebase has the same drift and I think this is due to the deformable attention module not being deterministic. Ideally the noise introduced by this drift should be smaller than the changes you are trying to ablate.