Juan Manuel Pérez comments

Results 35 comments of


                                            Juan Manuel Pérez

Model fails with index out of range in self when used on a text-classification pipeline

Do not understand why this fails, as the tokenizer has the `model_max_len` property set. Please report on `transformers`.

Will the corpus be openly published?

@alexvaca0 Thanks for your interest! We will be publishing the original tweets soon, hopefully in `datasets`. Leave this issue open so we let you know when they are available.

Will the corpus be openly published?

Hi @alexvaca0. I'm having some problems regarding the original tweets -- that is, the raw tweets prior to any preprocessing and filtering. The machine which contained this data is not...

Will the corpus be openly published?

Well, this is quite late, but finally, the tweets were released. I could only upload half of them, but I suppose this might be enough (~300M tweets). Check https://huggingface.co/datasets/pysentimiento/spanish-tweets In...

no contract dataset

Are you planning to publish the EDGAR instances of the dataset?