vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Allow multiple data files as input

Open sharathmalladi opened this issue 5 years ago • 10 comments

Describe the bug

Call vw with a bad argument and notice that vw does not return a non-zero error code. To detect whether vw rejected the arguments would require us to read the output and look for a line that says "sailing on!" .. which is not really a robust mechanism to return an error response.

To Reproduce

Steps to reproduce the behavior: For example (notice the vw parameters are "bad vw arguments" which are invalid parameters): VW COMMAND:

E:\sharathm\github\sharathmalladi-mwt-ds\DataScience>vw bad vw arguments -d D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz -p D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz.Custom Policy 1.pred predictions = D:/tmp/124bb2ca-a99f-489e-b29c-bc142baa6f51\6359742a010048a58c1892eabd731d4c\6359742a010048a58c1892eabd731d4c_merged_data_2019-01-03_2019-01-03.json.gz.Custom

Num weight bits = 18

learning rate = 0.5

initial_t = 0

power_t = 0.5

using no cache

Reading datafile = bad

can't open 'bad', sailing on!

num sources = 0

average since example example current current current

loss last counter weight label predict features

finished run

number of examples = 0

weighted example sum = 0.000000

weighted label sum = 0.000000

average loss = n.a.

total feature number = 0

E:\sharathm\github\sharathmalladi-mwt-ds\DataScience>echo %ERRORLEVEL%

0

Expected behavior

The error code after invoking vw should be non-zero since vw did not successfully output the predictions.

Observed Behavior

We instead get back an output that has a line that reads: can't open 'bad', sailing on!

Environment

What version of VW did you use? 8.6.1

What OS or language did you use? Windows command line

Additional context

None

sharathmalladi avatar May 28 '19 21:05 sharathmalladi

In this situation only bad is looked at out of bad vw arguments as a positional parameter for the --data option. This is a shortcut that's been around for some time. vw arguments are then ignored as unused values, and not options. The positional parameter actually overrides the value given by --data, and since bad is not a file a warning is printed when it can't be opened. So VW does actually exit successfully since there was no data to train on.

Yes, this seems counter intuitive. Handling the positional parameter in combination with the named parameter has been kind of tricky. I do agree, this seems like a bug. Not 100% sure how to deal with it yet.

jackgerrits avatar May 29 '19 00:05 jackgerrits

So I think what could be done here is support multiple data files as input, and then if none of the files are able to be opened them VW will exit with a non-zero return code. You would also need to pass --no_stdin in order for it to work though as stdin is treated as another input file.

jackgerrits avatar May 30 '19 19:05 jackgerrits

+1 to supporting multiple data files as inputs. I wanted this useful feature for a long time.

arielf avatar Jun 21 '19 07:06 arielf

#2355 additionally proposed support for globbing as well as passing a directory to the -d option.

jackgerrits avatar Mar 19 '20 21:03 jackgerrits

@jackgerrits Is anyone working on this issue?

Sharad24 avatar Mar 19 '20 21:03 Sharad24

No, you're welcome to work on this

jackgerrits avatar Mar 19 '20 22:03 jackgerrits

It's still open? Looks interesting to me.

dnabanita7 avatar Nov 06 '20 13:11 dnabanita7

Hi @dnabanita7, yes this is still open. Please feel free to work on it :)

jackgerrits avatar Nov 06 '20 15:11 jackgerrits

Hello, I'm new to the VW codebase! @jackgerrits could you help me with navigating to which file(s) would need to be changed to support this feature? I'd like to get started with working on this issue.

hs2361 avatar Feb 03 '21 09:02 hs2361

@jackgerrits is it still open ? can i work on this ?

Yash621 avatar Oct 03 '21 16:10 Yash621