vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Improve consistency of python and binary logging

Open ataymano opened this issue 2 years ago • 4 comments

Short description

There are mismatches in how python and vw binary logs are working:

  1. There are two logs channels in bin (driver_output/log_output) and only one in python (get_log())
  2. get_log() seems to be not associated with neither driver_output nor log_output: looks like it is having most of driver_output messages, but not having last lines from --explore_evals runs (update_count / final_multiplier) cmd:
>>vw --cb_explore_adf --dsjson -d cb_simple.json --explore_eval --log_output stderr --driver_output stderr
...
weighted label sum = 0.000000
average loss = -1.000000
total feature number = 4
update count = 1
final multiplier = 1.000000

python:

>>from vowpalwabbit import pyvw
>>execution = pyvw.vw('--cb_explore_adf --dsjson -d cb_simple.json --explore_eval --driver_output stderr --log_output stderr', enable_logging=True)
>>execution.finish()
>>execution.get_log()
...
 'average loss = -1.000000',
 'total feature number = 4',
 '']
  1. get_log() is not associated neither with stdout nor with stderr (is not affected by --driver_output / --log_output parameters)

How this suggestion will help you/others

Given 2 and 3 there are messages that are impossible to see from python (explore_eval/audit/etc)

Possible solution/implementation details

Maybe python get_log() method can be replaced with either get_stdout/get_stderr or get_driver/get_logs pairs?

ataymano avatar Oct 05 '22 02:10 ataymano

Ideally, Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure. Then the user can filter or consume as need be

jackgerrits avatar Oct 05 '22 13:10 jackgerrits

Hey @ataymano, I tried reproducing this issue with vw v9.0.0+

pyvw.vw() has been deprecated and now uses Workspace. The outputs create

train.dat

1:2:0.4 | a c 
3:0:0.2 | b d 
4:1:0.5 | a b 
2:1:0.3 | a b c
3:1:0.7 | a d 

Python command

import vowpalwabbit

vw = vowpalwabbit.Workspace("--cb 4 -d train.dat --driver_output stderr --log_output stderr", quiet=False, enable_logging=True)
vw.finish()
for line in vw.get_log():
    print(line)

output:

using no cache
Reading datafile = train.dat
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
cb_type = mtr
Enabled reductions: gd, scorer-identity, csoaa_ldf-rank, cb_adf, shared_feature_merger, cb_to_cbadf
Input label = CB
Output pred = MULTICLASS
average  since         example        example        current        current  current
loss     last          counter         weight          label        predict features
5.000000 5.000000            1            1.0        0:2:0.4            0:0       12
2.500000 0.000000            2            2.0        2:0:0.2            1:0       12
2.083333 1.666667            4            4.0        1:1:0.3            1:0       16

finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 64

CLI Command

vw --cb 4 -d train.dat

output:

Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = train.dat
num sources = 1
average  since         example        example  current  current  current
loss     last          counter         weight    label  predict features
5.000000 5.000000            1            1.0    known        1        3
2.500000 0.000000            2            2.0    known        2        3
2.083333 1.666667            4            4.0    known        2        4

finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 16

According to my observed output, the difference between Python and CLI is in "cb_type", "Enabled reductions", "Input label" and "Output pred".

I tried to take a look at pylibvw.cc to understand how the logging wrapper is setup but I'm really confused. It states "do not use vw_log". However, vw_log is what's being called during .get_log().

Any insights on this?

Harsh188 avatar Feb 06 '23 05:02 Harsh188

Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure.

@jackgerrits Isn't this what's currently happening with the py_log_wrapper class?

Harsh188 avatar Feb 13 '23 03:02 Harsh188

I implemented my take on this feature in the new set of python bindings (https://github.com/VowpalWabbit/py-vowpal-wabbit-next/blob/f5727409f9674168e7468865aa83b7e6ba8cdccd/src/main.cpp#L266). Essentially any internal log in VW is forwarded to Python's logging module so that the user can control the output using standard methods

jackgerrits avatar Feb 13 '23 15:02 jackgerrits