robertuito
robertuito copied to clipboard
Will the corpus be openly published?
Given how good this model works, I'd be interested in having access to the original corpus with which it was trained. Will that be possible? I mean, is there any plan for uploading it to huggingface/datasets or publish it in any other form?
Thank you very much in advance :)
@alexvaca0 Thanks for your interest! We will be publishing the original tweets soon, hopefully in datasets. Leave this issue open so we let you know when they are available.
Hi @alexvaca0. I'm having some problems regarding the original tweets -- that is, the raw tweets prior to any preprocessing and filtering. The machine which contained this data is not turning on, let's hope the disk is ok.
In the meanwhile, I have access to the preprocessed and filtered tweets (as described on the paper). If that's useful for you, send me an email and I'll give access to them.
I leave this issue open until we are able to publish the original data.
Oh that would be so great, if it is still possible to have access to the tweets... thank you very much :)
Well, this is quite late, but finally, the tweets were released. I could only upload half of them, but I suppose this might be enough (~300M tweets).
Check https://huggingface.co/datasets/pysentimiento/spanish-tweets
In the following days, I will be uploading the rest of them.