Li Dong
Li Dong
```bash ##################### # # Use this with or without the .gitattributes snippet with this Gist # create a fixle.sh file, paste this in and run it. # Why do you...
I see. The error might be caused by using WSL. I am unsure whether Gradio is supported under WSL.
Could you also attach the command that produces the error?
I found one blog (in Japanese) that might be useful https://zenn.dev/selllous/articles/retnet_tutorial.
```bash FAIRSEQ_DIR=$(pip list -v | grep 'fairseq' | awk '{print $3}') export PYTHONPATH=$PYTHONPATH:$FAIRSEQ_DIR ```
> Hi, Is there any resolution to this question for the initialization and recommended training configs to reproduce the paper results? I am also seeing some instability with the default...
The code and pre-trained models of BEiT-3 can be found at [aka.ms/beit3](https://aka.ms/beit3).
@jinxixiang Could you also post the loss curves (such as tensorboard screenshots) of the run `using MIM + MLM + contrastive loss: (does not converge)`?
You could try https://github.com/microsoft/torchscale if the issue is training stability (i.e., loss divergence). The Multiway architecture can be enabled by multiway=True. https://github.com/microsoft/torchscale#key-features
The code and pre-trained models of BEiT-3 can be found at [aka.ms/beit3](https://aka.ms/beit3).