gramex
gramex copied to clipboard
MLHandler's typecasting is too restrictive
Is something not working as expected? Because MLHandler accepts feature values through URLs, they have to be coerced into the correct types. This can be too restrictive, because the types are inferred from the dataframes that are cached during training. Especially, if a dataframe has a feature which is an integer, MLHandler won't allow it to have a value that is a float.
Steps to reproduce. Please help us reproduce the bug, by sharing:
- Paste this in a file named
xor.csv
:
x,y,z
0,0,0
0,1,1
1,0,1
1,1,0
- Use the following gramex config:
url:
xor:
pattern: /$YAMLURL/
handler: MLHandler
kwargs:
data: $YAMLPATH/xor.csv
model:
class: SVC
target_col: z
- Run gramex, and then try http://localhost:9988/?x=1.5&y=1.5, to get the following error
ERROR 26-Feb 20:11:37 web Uncaught exception GET /?x=1.5&y=1.5 (::1)
HTTPServerRequest(protocol='http', host='localhost:9988', method='GET', uri='/?x=1.5&y=1.5', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute
result = yield result
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
yielded = self.gen.throw(*exc_info)
File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 466, in get
self._predict, to_predict)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
value = future.result()
File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
return self.__get_result()
File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 336, in _predict
data = self._transform(data, deduplicate=False)
File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 308, in _transform
data[col] = data[col].astype(orgdata[col].dtype)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5882, in astype
dtype=dtype, copy=copy, errors=errors, **kwargs
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 581, in astype
return self.apply("astype", dtype=dtype, **kwargs)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 438, in apply
applied = getattr(b, f)(**kwargs)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 559, in astype
return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 643, in _astype
values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 707, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 547, in pandas._libs.lib.astype_intsafe
ValueError: invalid literal for int() with base 10: '1.5'
The underlying sklearn model is perfectly capable of classifying x=1.5 and y=1.5.
Assuming we stick with pandas for reading the csv file. Supporting read_csv(..., dtype={}, ...)
should allow users to be more explicit -- on how to infer types?
@pratapvardhan We currently don't have a way of letting Gramex users specify dtypes from the yaml or requests. We should, eventually.