Add readme for VW module
Hi vsuthichai!
I wanted to use spotz with VW but did not know where to start and then I found this ticket. Could you please add just one very basic example with VW? Much appreciated.
Hi @mostafa-zefr , apologies for the lack of VW documentation. I will try to get to that asap. May I ask what you're trying to use the VW integration for? Thanks.
@mostafa-zefr , there is documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw
It's on branch-1.0.1
Let me know if that can get you started with what you're trying to do.
Oh, I see the doc now! I am using it for a classification problem. Let me know exactly what you need to know about the application and I will share it with you if possible according to ZEFR policies. I have a question though; I need to have more control over training and test set (not using k-fold CV because of nature of my data where order matters). At first glance, I did not see ant way to specify the hold off dataset directly and it seems k-fold CV is the one deciding on test data set at each iteration. So wondering if I can directly specify a train and test set in the current version of the code?
@mostafa-zefr Have a look at this class here
https://github.com/eHarmony/spotz/blob/branch-1.0.1/vw/src/main/scala/com/eharmony/spotz/objective/vw/VwHoldoutObjective.scala
You can supply the VW dataset through the constructor as an Iterator, Iterable, or a path. If the VW dataset is being loaded from an RDD, you can call rdd.toLocalIterator
vwTrainParamsString allows you to specify the VW parameters during training. Note that certain VW arguments will not work like -d or anything related to caching because spotz will manipulate those internally before calling to VW.
Thanks Victor!
I figured those out. One thing that I noticed is that the documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw requires version 1.0.1 while in maven repo the latest version is 1.0.0. Is there a reason for not providing the 1.0.1 version in the maven repo?
@mostafa-zefr After the initial 1.0.0 release, I began working on some documentation and wanted to integrate other important features into a 1.0.1 release. The branch is where the ongoing 1.0.1 work happens and I haven't released 1.0.1 yet. There's still a lot to be done, but I don't have much time to get to it as my time is being allocated to another project right now.
I appreciate any feedback you can provide about your experiences using it. Good, bad, recommendations for improvement, etc.
@vsuthichai No worries! I'll be sure to get back to you with feedback.