Pedro Ortiz Suarez

Results 12 comments of Pedro Ortiz Suarez
trafficstars

I've been having an issue that might be related to this when trying to pre-tokenize a corpus and caching it for using it later in the pre-training of a RoBERTa...

Hello, The author of the OSCAR corpus here. After an increased amount of downloads in recent weeks, and continuous abuse by some users I had to take the corpus down...

Hello once again! We have managed to bring the corpus back online but we had to cut each subcorpora into smaller files and we had to put everything behind a...

I am not sure It is a memory error, I am getting the exact same error on a cluster with 3Tb of RAM at my disposal. However I do agree...

Aren't `-b` and `--baseurl` the same thing?

This works! Thanks! However, shouldn't it be documented somewhere for the people moving?

Hello! I haven't been able to reproduce the exception in Linux so it might be windows related. I'm trying to get a windows machine in order to try again. In...

Ok, we had some problems before with Python 3.6, I honestly don't think that the Python version is the problem, but if you have the time, can you try creating...

Thanks for the info! I have been looking around and apparently the `multiprocessing` library works differently on Windows, so this series of errors you are encountering might be caused by...

@fortepianissimo I finally got hold of a Windows machine and was able to reproduce the error, could you please comment lines 77 and 78 in the file `utilities/Embeddings.py`, that is,...