vowpal_wabbit
vowpal_wabbit copied to clipboard
Improve consistency of python and binary logging
Short description
There are mismatches in how python and vw binary logs are working:
- There are two logs channels in bin (driver_output/log_output) and only one in python (get_log())
- get_log() seems to be not associated with neither driver_output nor log_output: looks like it is having most of driver_output messages, but not having last lines from --explore_evals runs (update_count / final_multiplier) cmd:
>>vw --cb_explore_adf --dsjson -d cb_simple.json --explore_eval --log_output stderr --driver_output stderr
...
weighted label sum = 0.000000
average loss = -1.000000
total feature number = 4
update count = 1
final multiplier = 1.000000
python:
>>from vowpalwabbit import pyvw
>>execution = pyvw.vw('--cb_explore_adf --dsjson -d cb_simple.json --explore_eval --driver_output stderr --log_output stderr', enable_logging=True)
>>execution.finish()
>>execution.get_log()
...
'average loss = -1.000000',
'total feature number = 4',
'']
- get_log() is not associated neither with stdout nor with stderr (is not affected by --driver_output / --log_output parameters)
How this suggestion will help you/others
Given 2 and 3 there are messages that are impossible to see from python (explore_eval/audit/etc)
Possible solution/implementation details
Maybe python get_log() method can be replaced with either get_stdout/get_stderr or get_driver/get_logs pairs?
Ideally, Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure. Then the user can filter or consume as need be
Hey @ataymano, I tried reproducing this issue with vw v9.0.0+
pyvw.vw()
has been deprecated and now uses Workspace
. The outputs create
train.dat
1:2:0.4 | a c
3:0:0.2 | b d
4:1:0.5 | a b
2:1:0.3 | a b c
3:1:0.7 | a d
Python command
import vowpalwabbit
vw = vowpalwabbit.Workspace("--cb 4 -d train.dat --driver_output stderr --log_output stderr", quiet=False, enable_logging=True)
vw.finish()
for line in vw.get_log():
print(line)
output:
using no cache
Reading datafile = train.dat
num sources = 1
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
cb_type = mtr
Enabled reductions: gd, scorer-identity, csoaa_ldf-rank, cb_adf, shared_feature_merger, cb_to_cbadf
Input label = CB
Output pred = MULTICLASS
average since example example current current current
loss last counter weight label predict features
5.000000 5.000000 1 1.0 0:2:0.4 0:0 12
2.500000 0.000000 2 2.0 2:0:0.2 1:0 12
2.083333 1.666667 4 4.0 1:1:0.3 1:0 16
finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 64
CLI Command
vw --cb 4 -d train.dat
output:
Num weight bits = 18
learning rate = 0.5
initial_t = 0
power_t = 0.5
using no cache
Reading datafile = train.dat
num sources = 1
average since example example current current current
loss last counter weight label predict features
5.000000 5.000000 1 1.0 known 1 3
2.500000 0.000000 2 2.0 known 2 3
2.083333 1.666667 4 4.0 known 2 4
finished run
number of examples = 5
weighted example sum = 5.000000
weighted label sum = 0.000000
average loss = 1.952381
total feature number = 16
According to my observed output, the difference between Python and CLI is in "cb_type", "Enabled reductions", "Input label" and "Output pred".
I tried to take a look at pylibvw.cc to understand how the logging wrapper is setup but I'm really confused. It states "do not use vw_log". However, vw_log
is what's being called during .get_log()
.
Any insights on this?
Python could hook into the structured logs that VW produces and forward them to the standard Python logging infrastructure.
@jackgerrits Isn't this what's currently happening with the py_log_wrapper class?
I implemented my take on this feature in the new set of python bindings (https://github.com/VowpalWabbit/py-vowpal-wabbit-next/blob/f5727409f9674168e7468865aa83b7e6ba8cdccd/src/main.cpp#L266). Essentially any internal log in VW is forwarded to Python's logging
module so that the user can control the output using standard methods