GPU Parallel Sim fails with torch.deterministic set to True
There is a longstanding bug from pytorch where if torch.use_deterministic_algorithms=True and there is cuda advanced index_put with broadcasting, an error will occur: https://github.com/pytorch/pytorch/issues/79987
In ManiSkill, many of the setters rely on broadcasting which will all fail. Taking actor pose setter as an example:
# arg1 is shape (1, 7) and therefore the front dimension is broadcasted depending on the size of the mask
self.px.cuda_rigid_body_data.torch()[
self._body_data_index[self.scene._reset_mask[self._scene_idxs]], :7
] = arg1
The code above works if torch.use_deterministic_algorithms=False but if torch.use_deterministic_algorithms=True, we get the following result
File "/home/asjchoi/Desktop/Hobot/ManiSkill/mani_skill/utils/structs/actor.py", line 406, in set_pose
self.pose = arg1
File "/home/asjchoi/Desktop/Hobot/ManiSkill/mani_skill/utils/structs/actor.py", line 379, in pose
self.px.cuda_rigid_body_data.torch()[
File "/home/asjchoi/venvs/horizon310/lib/python3.10/site-packages/torch/utils/_device.py", line 104, in __torch_function__
return func(*args, **kwargs)
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == expandedValue.numel() INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":548, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor: 28 vs 49
A workaround is to avoid broadcasting via expansion. Torch expansion returns a view, so there should be no noticeable increase in cost.
assert arg1.shape == (1, 7)
mask = self._body_data_index[self.scene._reset_mask[self._scene_idxs]]
self.px.cuda_rigid_body_data.torch()[mask, :7] = arg1.expand(mask.shape[0], 7)
I've done these changes for all setters in my personal fork of ManiSkill. Let me know if this is something you're interested in having PRed.
Crazy that this is an issue after 3 years, I would have never even caught this (our RL code defaults to not using this).
Is there a strong use-case for use_deterministic_algorithms? I guess this primarily affects things like conv nets which if I recall have some optimizations at run time that are a bit random.
I don't recall seeing many snippets of code using expand before or this issue before until now. I personally like to avoid using expand if possible, are there any alternatives? (I think it is very possible to easily break the code without knowing it if this is something torch can't fix).
I will also discuss with some maintainers and see what they think about just updating all the code to use expand.
Only use case is to really maximize reproducibility. In addition to conv nets, it also eliminates nondeterminism from tensor addition and indexing it seems: https://docs.pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms
For alternatives, I'm not sure. Given that this has been a bug (reported by numerous users) for more than 3 years, I don't think we can expect a fix from torch anytime soon.
Perhaps a better alternative is to simply state in ManiSkill's README, that torch.use_deterministic_algorithms=True is not supported. An okay-level of determinism should be reached just by setting a consistent seed.
If the ManiSkill team decides they do want to support this though, I can submit a PR that fixes these issues for stepping. Not sure of any problems in any learning-related code though.