unilm
unilm copied to clipboard
Unable to reproduce DIT pre-training.
I am trying to reproduce the DIT model mentioned in the paper. using DALL-E encoder for image tokenizer without fine-tuning it with the IIT-CDIP dataset. took 1M document for training but the model is not converging as the loss is stagnated at 4.19807. does anyone tried to reproduce the model and changed any settings mentioned in the paper?
I have followed the steps https://github.com/NielsRogge/Transformers-Tutorials/blob/master/BEiT/Understanding_BeitForMaskedImageModeling.ipynb mentioned in the notebook.
HI @senthil-r-10 Were you able to reproduce the MIM task?