gramex icon indicating copy to clipboard operation
gramex copied to clipboard

MLHandler's typecasting is too restrictive

Open jaidevd opened this issue 3 years ago • 2 comments

Is something not working as expected? Because MLHandler accepts feature values through URLs, they have to be coerced into the correct types. This can be too restrictive, because the types are inferred from the dataframes that are cached during training. Especially, if a dataframe has a feature which is an integer, MLHandler won't allow it to have a value that is a float.

Steps to reproduce. Please help us reproduce the bug, by sharing:

  1. Paste this in a file named xor.csv:
x,y,z
0,0,0
0,1,1
1,0,1
1,1,0
  1. Use the following gramex config:
url:
        xor:
                pattern: /$YAMLURL/
                handler: MLHandler
                kwargs:
                        data: $YAMLPATH/xor.csv
                        model:
                                class: SVC
                                target_col: z

  1. Run gramex, and then try http://localhost:9988/?x=1.5&y=1.5, to get the following error
ERROR   26-Feb 20:11:37 web Uncaught exception GET /?x=1.5&y=1.5 (::1)
HTTPServerRequest(protocol='http', host='localhost:9988', method='GET', uri='/?x=1.5&y=1.5', version='HTTP/1.1', remote_ip='::1')
Traceback (most recent call last):
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute
    result = yield result
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 466, in get
    self._predict, to_predict)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result
    return self.__get_result()
  File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 336, in _predict
    data = self._transform(data, deduplicate=False)
  File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 308, in _transform
    data[col] = data[col].astype(orgdata[col].dtype)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5882, in astype
    dtype=dtype, copy=copy, errors=errors, **kwargs
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 581, in astype
    return self.apply("astype", dtype=dtype, **kwargs)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 438, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 559, in astype
    return self._astype(dtype, copy=copy, errors=errors, values=values, **kwargs)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 643, in _astype
    values = astype_nansafe(vals1d, dtype, copy=True, **kwargs)
  File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 707, in astype_nansafe
    return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
  File "pandas/_libs/lib.pyx", line 547, in pandas._libs.lib.astype_intsafe
ValueError: invalid literal for int() with base 10: '1.5'

The underlying sklearn model is perfectly capable of classifying x=1.5 and y=1.5.

jaidevd avatar Feb 26 '21 14:02 jaidevd

Assuming we stick with pandas for reading the csv file. Supporting read_csv(..., dtype={}, ...) should allow users to be more explicit -- on how to infer types?

pratapvardhan avatar Mar 03 '21 10:03 pratapvardhan

@pratapvardhan We currently don't have a way of letting Gramex users specify dtypes from the yaml or requests. We should, eventually.

jaidevd avatar Mar 03 '21 12:03 jaidevd