vowpal_wabbit icon indicating copy to clipboard operation
vowpal_wabbit copied to clipboard

Sklearn adapter function `tovw` does not support unsigned integers in features

Open jackgerrits opened this issue 2 years ago • 3 comments

Mitigation

A user should use signed integer types and not unsigned integer types when passing to the sklearn adapter functions.

Details

The tovw function uses dump_svmlight_file to convert to a format that can easily construct VW text examples.

This function does not support input of unsigned integers, it requires signed due to the pyx code internally in sklearn.

Fails:

from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))

VWRegressor().fit(X, y)

Succeeds:

from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='int32') # <-----
y = pd.Series(np.zeros(1))

VWRegressor().fit(X, y)

The same input works when passed to SKLearn itself:

from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd

X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))

LinearRegression().fit(X, y)

To fix this one way is to avoid using the dump_svmlight_file function. It is used currently as a way to easily convert the dataframe to vw text format.

jackgerrits avatar Jun 07 '23 14:06 jackgerrits

Is this issue still open?

mahimairaja avatar Sep 17 '23 12:09 mahimairaja

Yep! Feel free to tackle it if you'd like

jackgerrits avatar Sep 18 '23 14:09 jackgerrits

Is this issue open? Can I work on it?

manthanindane avatar Nov 11 '23 16:11 manthanindane