lcmeng
lcmeng
Has anyone been able to reproduce the results of larger architectures? I contacted the authors about two weeks ago, but it seems like a dead end.
Thank you. **But why are all the configurations intentionally removed from the log files**? For example, in the previously shared log file [here](https://github.com/microsoft/Swin-Transformer/files/6328720/log_rank0.txt), one can find the configurations of the...
Hi @zeliu98 , thank you for the detailed reply. I've listed the installed dependencies for Swin experiments. They seem to fully agree with the requirements. Can you spot any inconsistencies?...
And about the recommended Nvidia docker image nvcr-21.05, does it not conflict with the recommended dependencies? For example, it contains CUDA 11.3 (vs. the recommended 10.1) and PyTorch 1.9.0 (vs....
@zeliu98 In the newly released logs for larger Swin archs, the typical amp loss scaling, i.e. ` Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to xyz` is...
@zeliu98, thank you for the explanation. I've added some TensorBoard code to Swin to generate visualization of the training. It seems the drop of accuracy near the peak LR is...
A quick adjacent question: multi-GPU acceleration is currently not considered by the implementation, correct?
It's quite unfortunate that the main novelty claimed by the paper, i.e., the use of direct hardware feedback, is conveniently missing in this repo. In fact, even the paper failed...
Can you please point to the part where the direct HW feedback is used? Thanks. Without that, the repo is still quite limited in significance.
Thank you. I'll give it a spin as soon as you think it's in good shape. BTW, the README uses`--param_goal` which is no longer a valid argument for model_generator.py.