Li Dong

Results 47 comments of Li Dong

```bash ##################### # # Use this with or without the .gitattributes snippet with this Gist # create a fixle.sh file, paste this in and run it. # Why do you...

I see. The error might be caused by using WSL. I am unsure whether Gradio is supported under WSL.

I found one blog (in Japanese) that might be useful https://zenn.dev/selllous/articles/retnet_tutorial.

```bash FAIRSEQ_DIR=$(pip list -v | grep 'fairseq' | awk '{print $3}') export PYTHONPATH=$PYTHONPATH:$FAIRSEQ_DIR ```

> Hi, Is there any resolution to this question for the initialization and recommended training configs to reproduce the paper results? I am also seeing some instability with the default...

The code and pre-trained models of BEiT-3 can be found at [aka.ms/beit3](https://aka.ms/beit3).

@jinxixiang Could you also post the loss curves (such as tensorboard screenshots) of the run `using MIM + MLM + contrastive loss: (does not converge)`?

You could try https://github.com/microsoft/torchscale if the issue is training stability (i.e., loss divergence). The Multiway architecture can be enabled by multiway=True. https://github.com/microsoft/torchscale#key-features

The code and pre-trained models of BEiT-3 can be found at [aka.ms/beit3](https://aka.ms/beit3).