Tanmay Laud
Tanmay Laud
@Narsil It is a pretty simple script. I am just passing a text file to the tokenizer.train function. The tokenizer I am using is Unigram. The text file has 25...
Here is the trace with RUST_BACKTRACE=1: > thread '' panicked at 'called `Result::unwrap()` on an `Err` value: Internal', /__w/tokenizers/tokenizers/tokenizers/src/models/unigram/trainer.rs:203:53 stack backtrace: 0: rust_begin_unwind at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5 1: core::panicking::panic_fmt at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14 2:...
@Narsil , so should I do the rebuild with steps you mentioned previously or the latest comment? if it's the latest one, where should I make the change for overflow?
@Narsil, have you'll tested tokenizer trainer for a really large dataset? Consider a dataset of roughly >25M training examples. I have done nothing special but passed a large dataset to...
Here is the full script: > from tokenizers import Tokenizer > from tokenizers.models import Unigram, WordPiece > from tokenizers.trainers import UnigramTrainer, WordPieceTrainer > from tokenizers.normalizers import NFKC > from sacremoses...
I can help with ALBERT or Big Bird
Hi @koaning, I would like to contribute here. I have been looking to learn how to create such iPy visuals/widgets
The idea would be to provide a visual for any pipeline, especially the annotation-based pipelines to show what's being annotated and how ( NER, Question Answering ) similar to displacy....
For QnA ( tagging a span) , I have used displacy's manual annotation function where you just pass a dict with start, end index params. For streamlit based apps, I...
> @tanmaylaud for my understanding, are you familiar with Rasa? @koaning I am familiar with Rasa and what is used for and the methodologies ( thanks to the videos on...