fast-bert what's the meaning of the hyperparameter "text

hello! I'm wondering if I could ask you what the hyperparameter "text_list=texts" means? For example, in the section "3. Create DataBunch object" of "Language Model Fine-tuning" in the README.md , there is a code: databunch_lm = BertLMDataBunch.from_raw_corpus( data_dir=DATA_PATH, **text_list=texts,** tokenizer=args.model_name, batch_size_per_gpu=args.train_batch_size, max_seq_length=args.max_seq_length, multi_gpu=args.multi_gpu, model_type=args.model_type, logger=logger)

So, Is the parameter "texts" a list of words sequeces which contains all the words in one's own corpus prepared for LM Mask？ Maybe the length of this list is too long ? Thank you for your help~

Jan 08 '20 15:01 JiangYanting

I am having a similar problem here. In my interpretation text_list is a list where each entry corresponds to one of the sentences I am trying to classify.

The two files lm_train.txt and lm_val.txt are created as expected but then it either:

Takes a really long time to complete / looks like it stalls (hours)
Immediately completes the task (less then a minute)
Crashes by saying that num_samples should be a positive integeral value, but got num_samples=0 (trace back goes to RandomSampler(train_dataset) in data_lm.py

But, perhaps, I am misinterpreting what text_list actualy is ...

Feb 27 '20 16:02 Q-lds

Ditto. I assuemd it was meant to be a list of the texts (loaded from the files into memory), but it fails with num_samples should be a positive integeral value, but got num_samples=0. Setting it to be the training file name, also fails with the above error

Mar 07 '20 11:03 ddofer

text_list is a list of the texts (List[str]), where each entry in the list is one of the texts as a string. if you have too few samples (not much text), you will get the num_samples should be a positive integeral value, but got num_samples=0 error.

Mar 30 '20 09:03 jkhalsa-arabesque

If you have a text file with one text per line, a quick way to create the text_list object to be loaded into the function:

from numpy import loadtxt
texts = loadtxt("/content/bert.txt", dtype=str, delimiter="\n", unpack=False)

# Validate that it loaded properly
print(text[0])
print(len(texts))

Then you can pass texts onto BertLMDataBunch.

Apr 03 '20 00:04 trisongz

If you have a text file with one text per line, a quick way to create the text_list object to be loaded into the function:
from numpy import loadtxt
texts = loadtxt("/content/bert.txt", dtype=str, delimiter="\n", unpack=False)

# Validate that it loaded properly
print(text[0])
print(len(texts))
Then you can pass texts onto BertLMDataBunch.

@trisongz what if my file is too large (700mb) I'll get OOM when trying to run your code, is there any other way to do this?

Jun 01 '20 11:06 krannnn

fast-bert
fast-bert copied to clipboard

what's the meaning of the hyperparameter "text_list=texts" ?

fast-bert fast-bert copied to clipboard

what's the meaning of the hyperparameter "text_list=texts" ?

fast-bert
fast-bert copied to clipboard