NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Support pathlib in FileDataStream

Open ianlini opened this issue 5 years ago • 3 comments

pathlib is a built-in module that is very popular in Python. Almost all APIs in Python built-in modules, numpy and pandas support path-like objects as arguments for path-related parameters. Therefore, it would be better to support them in FileDataStream.

Current behavior:

In [1]: from nimbusml import FileDataStream

In [2]: from pathlib import Path

In [3]: test= FileDataStream.read_csv(Path('test.csv'))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-41a5e889c3ff> in <module>
----> 1 test= FileDataStream.read_csv(Path('test.csv'))

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv(filepath_or_buffer, tool, nrows, **kwargs)
    306         if tool == 'pandas':
    307             return FileDataStream.read_csv_pandas(
--> 308                 filepath_or_buffer, nrows=nrows, **kwargs)
    309         elif tool == 'internal':
    310             if 'schema' not in kwargs:

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/utils.py in wrapper(*args, **kwargs)
    218                          '__qualname__',
    219                          func.__name__)))
--> 220             params = func(*args, **kwargs)
    221             if verbose > 0:
    222                 logger_trace.info(

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_stream.py in read_csv_pandas(filepath_or_buffer, nrows, collapse, numeric_dtype, **kwargs)
    340         """
    341         schema = DataSchema.read_schema(filepath_or_buffer, collapse=collapse,
--> 342                                         numeric_dtype=numeric_dtype, **kwargs)
    343         return FileDataStream(filepath_or_buffer, schema)
    344

~/.pyenv/versions/3.7.4/lib/python3.7/site-packages/nimbusml/internal/utils/data_schema.py in read_schema(*data, **options)
    855                 raise TypeError(
    856                     "Unable to guess the schema for type '{0}'".format(
--> 857                         type(X)))
    858             final_schema = sch
    859

TypeError: Unable to guess the schema for type '<class 'pathlib.PosixPath'>'

Expected behavior: FileDataStream.read_csv(Path('test.csv')) is equivalent to FileDataStream.read_csv('test.csv').

ianlini avatar Sep 16 '19 08:09 ianlini

Thank you @ianlini, it should be straightforward change to support this. would u like to take it on ?

ganik avatar Sep 18 '19 16:09 ganik

Hi! I’m new to open source and I’d like to take on this task along with #274 over the next couple of weeks. Is that alright?

pnshinde avatar Nov 18 '19 02:11 pnshinde

Hi @pnshinde ! You are very welcome to take this on! Let me know if you need any help, thx

ganik avatar Nov 18 '19 02:11 ganik