xlearn icon indicating copy to clipboard operation
xlearn copied to clipboard

Transform data to binary format before training a model

Open orenov opened this issue 5 years ago • 7 comments

Hi! Can you please help me to understand the package API. Specifically, can I convert data from libsvm/libffm format to binary format before training a model?

Thanks in advance.

orenov avatar Sep 25 '18 21:09 orenov

@orenov Hi, xLearn can convert libsvm/libffm to binary automatically. For example:

You have a TXT file called train.txt, and if you run

./xlearn_train ./train.txt

You can find a new file called train.txt.bin in current file path.

xLearn will check if current path has a .bin file automatically before training.

aksnzhy avatar Sep 26 '18 03:09 aksnzhy

Yes. Hi @aksnzhy . Thanks for so quick response.

I'd like to transform data to binary format before training. Like fully separate training phase and data preparation. Ok.

Can you please give me some intuition what can happen if I have no *.bin file at the moment, but then I start training in 4 separate runs with different hyperparameters (with the same data file). Is *.bin file will be created correctly? As all 4 separate scripts won't find *.bin file and start procedure to create it.

orenov avatar Sep 26 '18 07:09 orenov

I think you can set 4 separate data with different file name, and then xLearn will create 4 different binary data correctly.

aksnzhy avatar Sep 26 '18 10:09 aksnzhy

And also, the xLearn cross-validation can split the big data file into 4 small file automatically, if you need it.

aksnzhy avatar Sep 26 '18 10:09 aksnzhy

bin 文件速度会更快?

xxllp avatar Sep 28 '18 08:09 xxllp

文件一大跑起来感觉很慢哈 ,有啥建议?

xxllp avatar Sep 28 '18 08:09 xxllp

@orenov Hi, xLearn can convert libsvm/libffm to binary automatically. For example:

You have a TXT file called train.txt, and if you run

./xlearn_train ./train.txt

You can find a new file called train.txt.bin in current file path.

xLearn will check if current path has a .bin file automatically before training.

格式转换之后还有训练过程比较慢,看代码目前应该没有单独的transform流程吧,对于大文件,数据预先转换成binary格式会快很多

seiyagoo avatar Dec 03 '20 07:12 seiyagoo