NimbusML
NimbusML copied to clipboard
Support pathlib in FileDataStream
pathlib
is a built-in module that is very popular in Python. Almost all APIs in Python built-in modules, numpy and pandas support path-like objects as arguments for path-related parameters. Therefore, it would be better to support them in FileDataStream
.
Current behavior:
In [1]: from nimbusml import FileDataStream
In [2]: from pathlib import Path
In [3]: test= FileDataStream.read_csv(Path('test.csv'))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-41a5e889c3ff> in <module>
----> 1 test= FileDataStream.read_csv(Path('test.csv'))
~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
218 '__qualname__',
219 func.__name__)))
--> 220 params = func(*args, **kwargs)
221 if verbose > 0:
222 logger_trace.info(
~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv(filepath_or_buffer, tool, nrows, **kwargs)
306 if tool == 'pandas':
307 return FileDataStream.read_csv_pandas(
--> 308 filepath_or_buffer, nrows=nrows, **kwargs)
309 elif tool == 'internal':
310 if 'schema' not in kwargs:
~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
218 '__qualname__',
219 func.__name__)))
--> 220 params = func(*args, **kwargs)
221 if verbose > 0:
222 logger_trace.info(
~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv_pandas(filepath_or_buffer, nrows, collapse, numeric_dtype, **kwargs)
340 """
341 schema = DataSchema.read_schema(filepath_or_buffer, collapse=collapse,
--> 342 numeric_dtype=numeric_dtype, **kwargs)
343 return FileDataStream(filepath_or_buffer, schema)
344
~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_schema.py in read_schema(*data, **options)
855 raise TypeError(
856 "Unable to guess the schema for type '{0}'".format(
--> 857 type(X)))
858 final_schema = sch
859
TypeError: Unable to guess the schema for type '<class 'pathlib.PosixPath'>'
Expected behavior:
FileDataStream.read_csv(Path('test.csv'))
is equivalent to FileDataStream.read_csv('test.csv')
.
Thank you @ianlini, it should be straightforward change to support this. would u like to take it on ?
Hi! I’m new to open source and I’d like to take on this task along with #274 over the next couple of weeks. Is that alright?
Hi @pnshinde ! You are very welcome to take this on! Let me know if you need any help, thx