Aditya Malte
Aditya Malte
Hi, So do we now have a large dataset? Would be great if it was open-source
Hi @joewandy , Any updates on this. Could be a crucial feature for document retrieval.
@shah-sid-cutshort are you facing this same problem?
Hey @o19s-admin, This seems to be an old issue, do we have an update/fix on it? If not, then it must be at least mentioned somewhere in the docs that...
@OP I’m working on it, will share when done. Thanks
Check this, A small example I have created https://gist.github.com/aditya-malte/2d4f896f471be9c38eb4d723a710768b#file-smallberta_pretraining-ipynb
@julien-c , I have pruned the dataset to the first 200,000 samples so that the notebook may run quickly on Colab, as this is meant to be more like a...
Hi, The easiest solution (and I have also used the same in my Colab notebook) is just to rename the files using !mv. I know this is a hack but...
@julien-c , this is another issue that I wanted to point out. While renaming does work, it is a bit confusing for the programmer and takes some time to figure...
I’m not sure, I’ll have to see your code for that. Perhaps it could be possible that it is just an incorrect path.