SynapseML VowpalWabbit - Train with Synapse on databricks and inference via native VW CLI

Hi,

I am training my VW CB model using synapseml on databricks.

Code is same as in: https://microsoft.github.io/SynapseML/docs/features/vw/Vowpal%20Wabbit%20-%20Overview/#vw-contextual-bandit

the model and featurizers/zipper pipeline are saved once training is complete.

now I want to do the inference using the native VW CLI. This is because the inference environment does not support spark.

can someone please shed light on how this can be done?

Feb 03 '22 15:02 harresbintariq

Adding @jackgerrits and @eisber for visibility here

Feb 24 '22 18:02 mhamilton723

you should be able to get the binary model saved: https://github.com/microsoft/SynapseML/blob/master/vw/src/main/scala/com/microsoft/azure/synapse/ml/vw/VowpalWabbitBaseModel.scala#L111

the tricky piece is to get the featurization right. did you use the VW featurizer or MLSpark featurize?

Feb 24 '22 21:02 eisber

I used the VW featurizer as detailed here: https://microsoft.github.io/SynapseML/docs/features/vw/Vowpal%20Wabbit%20-%20Overview/#vw-contextual-bandit

Feb 28 '22 09:02 harresbintariq

I am able to save the binary model but issue lies with the featurizer (as was mentioned above as well).

Feb 28 '22 09:02 harresbintariq

The input for the featurizer is the follow table (except the target column) Data types are either int or float, thus the NumericFeaturizer is used.

VW Featurizer outputColumn is used as namespace, column names are feature names, values are used as feature values. Thus, for the example you reference this should result in

|features age:63 sex:1 cp:3 tresetbps:233 ...

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1

From an ML perspective this featurization isn't ideal for VW, especially for low cardinality categorical I'd suggest to stringify them to get individual weights.

Assuming you change the data type of cp to string

|features age:63 sex:1 cp3:1 tresetbps:233 ...
|features age:37 sex:1 cp2:1 tresetbps:250 ...

If you want to make use of VWs interaction feature, you'll have to produce multiple feature vectors using different targetCols.

Mar 01 '22 07:03 eisber

I understand. But how should I port a model trained on databricks using synapseml (python) with featurizers and zippers to VW CLI. Please refer below for further explanation:

I am training my VW Contextual Bandits model using synapseml on databricks.

Code is same as in: https://microsoft.github.io/SynapseML/docs/features/vw/Vowpal%20Wabbit%20-%20Overview/#vw-contextual-bandit

the model and featurizers/zipper pipeline are saved once training is complete.

now I want to do the inference using the native VW CLI. This is because the inference environment does not support spark.

can someone please shed light on how this can be done?

Mar 01 '22 10:03 harresbintariq

I'm not sure I follow how you plan to flow your data to VW cli - it only accepts text and json as input.

As of today the VW Spark featurizers don't support a serialization format outside of the spark eco-system.

Mar 01 '22 12:03 eisber

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1

SynapseML SynapseML copied to clipboard

VowpalWabbit - Train with Synapse on databricks and inference via native VW CLI

SynapseML
SynapseML copied to clipboard

age	sex	cp	trestbps	chol	fbs	restecg	thalach	exang	oldpeak	slope	thal	target
63	1	3	145	233	1	0	150	0	2.3	0	1	1
37	1	2	130	250	0	1	187	0	3.5	0	2	1
41	0	1	130	204	0	0	172	0	1.4	2	2	1
56	1	1	120	236	0	1	178	0	0.8	2	2	1
57	0	0	120	354	0	1	163	1	0.6	2	2	1