bhack
bhack
> Sorry, I don't understand what you mean. Is it clear now?
Thanks, I've checked your changes and they were the same I've done locally on my side in these days. What do you think now about this (https://github.com/pytorch/pytorch/issues/37444)? https://github.com/yoxu515/aot-benchmark/blob/d5bd73fd09a82ed6c9dd8d516ab23b6a1e3f8045/tools/train.py#L78-L79
Other then my previous comment I think that we still have now an issue with the `keys` with `DIST_ENABLE=False`: https://github.com/yoxu515/aot-benchmark/blob/d5bd73fd09a82ed6c9dd8d516ab23b6a1e3f8045/networks/managers/trainer.py#L677-L682 ```python for key in boards['image'].keys(): AttributeError: 'list' object has no...
It seems that we have two issues: - The first is that the trainer it seems to be "randomly" deadlocked on different runs same code - images in `img_logs` seems...
For the first issue this is the stacktrace of one of the deadlock and it could be related to https://github.com/yoxu515/aot-benchmark/issues/36#issuecomment-1460151286: ```python File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args,...
> I guess the first problem is due to torch.spawn, and I have modified the code related to [it](https://github.com/yoxu515/aot-benchmark/blob/f9a62b60e12218019c0617efa1315dcc73147cdb/tools/train.py#L79). Please take a try, and hope this will work for you....
The same need to be fixed in the `PAOT` branch.
I have a PR to support `bfloat16` in the third party pytorch correlation ops https://github.com/ClementPinard/Pytorch-Correlation-extension/pull/106 But we still don't know if https://github.com/pytorch/pytorch/issues/104157 it has the same problem with `bfloat16`. See...
What is the perf gap without spatial correlation sampler?
> Very soon, this repo will also host the PIPs++ model from our ICCV 2023 paper @aharley Any roadmap to release PIPs++?