sobayed
sobayed
Same issue here, the highlighted text does not show for `show_prediction()` with ELI5 0.10.1 and scikit-learn 0.22
Similar observation here. I trained a tokenizer (`Tokenizer(vocabulary_size=64000, model=SentencePieceBPE, unk_token=, replacement=▁, add_prefix_space=True, dropout=None)`) on a dataset of 200k records. I want to use the tokenizer in a scikit-learn pipeline so...
I installed `tokenizers` from PyPI, not from source. Unfortunately, I cannot share the data but here's a slightly different reproducible example: ```python import pandas as pd from tokenizers import SentencePieceBPETokenizer...
Hi @Narsil, many thanks for your explanations! I'm actually aware of the differences in the algorithms. My question was mainly whether the method I'm currently using is the **fastest** way...
For my current use case, batch encoding is unfortunately not an option. Still good to know about it though for the future! Many thanks again for your help 👍
Here is another alternative solution based on [QueueHandler](https://docs.python.org/3/library/logging.handlers.html#queuehandler) and [QueueListener](https://docs.python.org/3/library/logging.handlers.html#queuelistener). It has the advantage that all configuration (handlers, formatters, etc.) can be done in the main process and the workers...
Any updates on this? I encountered this bug when trying to use scikit-learn's [LabelEncoder](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder) on pandas DataFrame columns with dtype "object" and missing values
Same problem here when running `conda env create -f environment.yml` (stuck at the last line forever): ``` TRACE conda.gateways.disk.delete:rm_rf(160): rm_rf /home/vagrant/.conda/envs/myenv/conda-meta/history.c~ TRACE conda.gateways.disk.delete:rm_rf(166): rm_rf failed. Not a link, file, or...