Morgan McGuire
Morgan McGuire
@radekosmulski After a quick look it seems like its the parallel tokenization that really slows down for some reason. In your colab example for me the progress bar flies for...
Ah understood now...hmmm...I wonder what is going on
Just to put some numbers on this, the decrease in speed is really disproportionate to the increase in dataset size... ## Numbers ### Data sizes, train/val split (lines in .txt...
Just taking another peak at this, the slowness seems to come from the amount of data in the datasets, e.g. `dls.train_ds` Here is the size of the items in `train_ds`...
Tried a few things based on what you said, but first here is a minimal repro: ## Repro ``` import fastai from fastai.text.all import * from fastcore.basics import * path...
I spoke too soon, moving `Numericalize` to 'item_tfms' **does** speed things up when using a dataframe with large chunks of text in each row. But another issue with @radekosmulski 's...
# Potential Solutions 1. Use smaller text files :D **or** 2. After a quick chat with Jeremy on the discord, a temporary work around would be do to the numericalization...
@ayulockin can probably help here :)
Flagged with Ayush + Soumik
Hey @firezym , @GraesonB , our preferred LangChain integration, W&B Prompts, can be found here: https://docs.wandb.ai/guides/prompts/quickstart The above is an earlier callback that we'll likely be deprecating in the coming...