xlearn icon indicating copy to clipboard operation
xlearn copied to clipboard

Binary file (./small_train.txt.bin) NOT found. Convert text file to binary file.

Open suresh-chinta opened this issue 5 years ago • 6 comments

Hi,

I have challenges in installing xlearn, when I try the command prompt option. I get the below error. When I import as python package, there is no error in importing, but I do not see hello, SetTrain, SetValid methods in the imported package.


       _
      | |
 __  _| |     ___  __ _ _ __ _ __
 \ \/ / |    / _ \/ _` | '__| '_ \ 
  >  <| |___|  __/ (_| | |  | | | |
 /_/\_\_____/\___|\__,_|_|  |_| |_|

    xLearn   -- 0.40 Version --

[ WARNING ] Validation file not found, xLearn has already disable early-stopping. [------------] xLearn uses 8 threads for training task. [ ACTION ] Read Problem ... [------------] First check if the text file has been already converted to binary format. [------------] Binary file (./small_train.txt.bin) NOT found. Convert text file to binary file. Aborted (core dumped)

suresh-chinta avatar Dec 26 '18 17:12 suresh-chinta

@suresh-chinta Hi, can you show me your command line and python code? Thank you!

aksnzhy avatar Dec 27 '18 05:12 aksnzhy

image

Hi Please find the above image of command prompt. I have made xlearn run from python code. and here is the code.

ffm_model = xl.create_ffm()

set training and validation data

ffm_model.setTrain("train_set.libffm") ffm_model.setValidate("valid_set.libffm")

define params

param = {'task':'binary', 'lr':0.2, 'k':4, 'lambda':0.0002, 'metric':'auc', 'epoch': 15}

train the model

ffm_model.fit(param, 'xl.out')

could there be something wrong with the way the libffm files are created ? I am using the below code to create the libffm files.

max_val = 1 with open('train.libffm', 'a') as the_file: for t, row in enumerate(DictReader(open(train_path))): if t % 100000 == 0: print(t, len(field_features), max_val) label = [row['HasDetections']] ffeatures = []

    for field in categories:
        if field == 'HasDetections':
            continue
        feature = row[field]
        if feature == '':
            feature = "unk"
        if field not in num_cols:
            ff = field + '_____' + feature
        else:
            if feature == "unk" or float(feature) == -1:
                ff = field + '_____' + str(0)
            else:
                if field in too_many_vals:
                    ff = field + '_____' + str(int(round(math.log(1 + float(feature)))))
                else:
                    ff = field + '_____' + str(int(round(float(feature))))
        if ff not in field_features:
            if len(field_features) == 0:
                field_features[ff] = 1
                max_val += 1
            else:
                field_features[ff] = max_val + 1
                max_val += 1

        fnum = field_features[ff]

        ffeatures.append('{}:{}:1'.format(categories_index[field], fnum))
    line = label + ffeatures
    the_file.write('{}\n'.format(' '.join(line)))

head -n 8000000 train.libffm > train_set.libffm (base) xxxx@GREATTOPPRO:~/Downloads/xx$ tail -n +8000001 train.libffm > valid_set.libffm tail -n +8000001 train.libffm > valid_set.libffm

suresh-chinta avatar Dec 28 '18 15:12 suresh-chinta

Has the problem been solved?I met the same problem..

currylym avatar May 21 '19 11:05 currylym

I also run into this error when running the example code on classification: Binary file (/home/wxk/Data/FFM_loan_pred/data/small_train.txt.bin) NOT found. Convert text file to binary file.

wmmxk avatar Jun 24 '19 00:06 wmmxk

In my case, I used wget to download the data. In fact, the file downloaded was not the txt file but just a txt file with some urls. After I figured this out, it worked.

wmmxk avatar Jun 24 '19 01:06 wmmxk

Just in case anyone like me arrived here looking for a solution:

I was trying to predict over a data set where not all rows have representation of all the fields. This made my python kernel die with any further information.

The problem was the instance-wise normalization. If you just remove the default behaviour, it does not crush such in my usecase:

ffm = xlearn.create_ffm()
ffm.setTest('./data/scoring.ffm') 

# Start to predict
ffm.disableNorm()
ffm.setSigmoid()
scores = ffm.predict('./data/model_complete.bin')

nenetto avatar Jun 14 '23 13:06 nenetto