spotz icon indicating copy to clipboard operation
spotz copied to clipboard

Add readme for VW module

Open vsuthichai opened this issue 9 years ago • 8 comments

vsuthichai avatar Sep 14 '16 00:09 vsuthichai

Hi vsuthichai!

I wanted to use spotz with VW but did not know where to start and then I found this ticket. Could you please add just one very basic example with VW? Much appreciated.

mostafa-zefr avatar Feb 10 '17 01:02 mostafa-zefr

Hi @mostafa-zefr , apologies for the lack of VW documentation. I will try to get to that asap. May I ask what you're trying to use the VW integration for? Thanks.

vsuthichai avatar Feb 13 '17 21:02 vsuthichai

@mostafa-zefr , there is documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw

It's on branch-1.0.1

Let me know if that can get you started with what you're trying to do.

vsuthichai avatar Feb 13 '17 21:02 vsuthichai

Oh, I see the doc now! I am using it for a classification problem. Let me know exactly what you need to know about the application and I will share it with you if possible according to ZEFR policies. I have a question though; I need to have more control over training and test set (not using k-fold CV because of nature of my data where order matters). At first glance, I did not see ant way to specify the hold off dataset directly and it seems k-fold CV is the one deciding on test data set at each iteration. So wondering if I can directly specify a train and test set in the current version of the code?

mostafa-zefr avatar Feb 13 '17 23:02 mostafa-zefr

@mostafa-zefr Have a look at this class here

https://github.com/eHarmony/spotz/blob/branch-1.0.1/vw/src/main/scala/com/eharmony/spotz/objective/vw/VwHoldoutObjective.scala

You can supply the VW dataset through the constructor as an Iterator, Iterable, or a path. If the VW dataset is being loaded from an RDD, you can call rdd.toLocalIterator

vwTrainParamsString allows you to specify the VW parameters during training. Note that certain VW arguments will not work like -d or anything related to caching because spotz will manipulate those internally before calling to VW.

vsuthichai avatar Feb 13 '17 23:02 vsuthichai

Thanks Victor!

I figured those out. One thing that I noticed is that the documentation here https://github.com/eHarmony/spotz/tree/branch-1.0.1/vw requires version 1.0.1 while in maven repo the latest version is 1.0.0. Is there a reason for not providing the 1.0.1 version in the maven repo?

mostafa-zefr avatar Feb 14 '17 18:02 mostafa-zefr

@mostafa-zefr After the initial 1.0.0 release, I began working on some documentation and wanted to integrate other important features into a 1.0.1 release. The branch is where the ongoing 1.0.1 work happens and I haven't released 1.0.1 yet. There's still a lot to be done, but I don't have much time to get to it as my time is being allocated to another project right now.

I appreciate any feedback you can provide about your experiences using it. Good, bad, recommendations for improvement, etc.

vsuthichai avatar Feb 14 '17 18:02 vsuthichai

@vsuthichai No worries! I'll be sure to get back to you with feedback.

mostafa-zefr avatar Feb 14 '17 19:02 mostafa-zefr