xlearn
xlearn copied to clipboard
Binary file (./small_train.txt.bin) NOT found. Convert text file to binary file.
Hi,
I have challenges in installing xlearn, when I try the command prompt option. I get the below error. When I import as python package, there is no error in importing, but I do not see hello, SetTrain, SetValid methods in the imported package.
_
| |
__ _| | ___ __ _ _ __ _ __
\ \/ / | / _ \/ _` | '__| '_ \
> <| |___| __/ (_| | | | | | |
/_/\_\_____/\___|\__,_|_| |_| |_|
xLearn -- 0.40 Version --
[ WARNING ] Validation file not found, xLearn has already disable early-stopping. [------------] xLearn uses 8 threads for training task. [ ACTION ] Read Problem ... [------------] First check if the text file has been already converted to binary format. [------------] Binary file (./small_train.txt.bin) NOT found. Convert text file to binary file. Aborted (core dumped)
@suresh-chinta Hi, can you show me your command line and python code? Thank you!
Hi Please find the above image of command prompt. I have made xlearn run from python code. and here is the code.
ffm_model = xl.create_ffm()
set training and validation data
ffm_model.setTrain("train_set.libffm") ffm_model.setValidate("valid_set.libffm")
define params
param = {'task':'binary', 'lr':0.2, 'k':4, 'lambda':0.0002, 'metric':'auc', 'epoch': 15}
train the model
ffm_model.fit(param, 'xl.out')
could there be something wrong with the way the libffm files are created ? I am using the below code to create the libffm files.
max_val = 1 with open('train.libffm', 'a') as the_file: for t, row in enumerate(DictReader(open(train_path))): if t % 100000 == 0: print(t, len(field_features), max_val) label = [row['HasDetections']] ffeatures = []
for field in categories:
if field == 'HasDetections':
continue
feature = row[field]
if feature == '':
feature = "unk"
if field not in num_cols:
ff = field + '_____' + feature
else:
if feature == "unk" or float(feature) == -1:
ff = field + '_____' + str(0)
else:
if field in too_many_vals:
ff = field + '_____' + str(int(round(math.log(1 + float(feature)))))
else:
ff = field + '_____' + str(int(round(float(feature))))
if ff not in field_features:
if len(field_features) == 0:
field_features[ff] = 1
max_val += 1
else:
field_features[ff] = max_val + 1
max_val += 1
fnum = field_features[ff]
ffeatures.append('{}:{}:1'.format(categories_index[field], fnum))
line = label + ffeatures
the_file.write('{}\n'.format(' '.join(line)))
head -n 8000000 train.libffm > train_set.libffm (base) xxxx@GREATTOPPRO:~/Downloads/xx$ tail -n +8000001 train.libffm > valid_set.libffm tail -n +8000001 train.libffm > valid_set.libffm
Has the problem been solved?I met the same problem..
I also run into this error when running the example code on classification: Binary file (/home/wxk/Data/FFM_loan_pred/data/small_train.txt.bin) NOT found. Convert text file to binary file.
In my case, I used wget to download the data. In fact, the file downloaded was not the txt file but just a txt file with some urls. After I figured this out, it worked.
Just in case anyone like me arrived here looking for a solution:
I was trying to predict over a data set where not all rows have representation of all the fields. This made my python kernel die with any further information.
The problem was the instance-wise normalization. If you just remove the default behaviour, it does not crush such in my usecase:
ffm = xlearn.create_ffm()
ffm.setTest('./data/scoring.ffm')
# Start to predict
ffm.disableNorm()
ffm.setSigmoid()
scores = ffm.predict('./data/model_complete.bin')