modin icon indicating copy to clipboard operation
modin copied to clipboard

Does this module support windows?

Open wqh17101 opened this issue 2 years ago • 16 comments

System information

  • OS Platform and Distribution:Win11

  • Modin version :0.13.2

  • Python version:3.8.8

  • Code we can use to reproduce: import modin.pandas as pd from modin.config import Engine

    Engine.put("dask") Engine.put("ray")
    data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')

Describe the problem

Not work on Win11

Source code / logs

using dask:

UserWarning: Dask execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    from distributed import Client

    client = Client()

Traceback (most recent call last):
  File ".\test_read.py", line 45, in <module>
    print((test_modin_pd() == test_pd()).all())
  File ".\test_read.py", line 30, in test_modin_pd
    data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
    return _read(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 60, in _read
    pd_obj = FactoryDispatcher.read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\dispatcher.py", line 180, in read_csv
    return cls.__factory._read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\factories.py", line 207, in _read_csv
    return cls.io_cls.read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\file_dispatcher.py", line 151, in read
    query_compiler = cls._read(*args, **kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 1007, in _read
    splits = cls.partitioned_file(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 314, in partitioned_file
    outside_quotes = cls.offset(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 192, in offset
    outside_quotes, _ = cls._read_rows(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 386, in _read_rows
    for line in iterator:
TypeError: 'LocalFileOpener' object is not iterable

using ray:

UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

Traceback (most recent call last):
  File ".\test_read.py", line 45, in <module>
    test_modin_pd()
  File ".\test_read.py", line 30, in test_modin_pd
    data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
    return _read(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 56, in _read
    Engine.subscribe(_update_engine)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
    callback(cls)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\__init__.py", line 128, in _update_engine
    initialize_ray()
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\ray\common\utils.py", line 183, in initialize_ray
    ray.init(**ray_init_kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\worker.py", line 918, in init
    _global_node = ray.node.Node(
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 215, in __init__
    self.start_head_processes()
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 911, in start_head_processes
    self.start_redis()
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 715, in start_redis
    self.get_resource_spec(),
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 344, in get_resource_spec
    self._resource_spec = ResourceSpec(
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\resource_spec.py", line 149, in resolve
    num_gpus = _autodetect_num_gpus()
  File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\resource_spec.py", line 241, in _autodetect_num_gpus
    lines = subprocess.check_output(cmdargs).splitlines()[1:]
  File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] can not find that file.

wqh17101 avatar Mar 05 '22 16:03 wqh17101

Hi @wqh17101 thanks for the report! Modin works on Windows.

What type of object did you pass as file_path? It seems like that might be causing the issue.

devin-petersohn avatar Mar 06 '22 20:03 devin-petersohn

hi, file_path="./x.csv", the same code can work on Linux @devin-petersohn

wqh17101 avatar Mar 07 '22 01:03 wqh17101

@wqh17101 is this a test file you can share? I have tried to reproduce this, but I haven't been able to.

devin-petersohn avatar Mar 08 '22 15:03 devin-petersohn

Sure,it is a normal csv test_Y.csv

wqh17101 avatar Mar 08 '22 16:03 wqh17101

@wqh17101 can you share the full script? I wasn't able to reproduce the issue with your code and file. Are you doing os.chdir or something like that? That could be contributing here.

image

devin-petersohn avatar Mar 10 '22 19:03 devin-petersohn

sure, it is a common script.

#!/usr/bin/env python
# _*_coding:utf-8_*_
"""
@Time   :  2022/3/5 22:20
@Author :  Qinghua Wang
@Email  :  [email protected]
"""
import time

import pandas as pd

file_path = r"./test.csv"

def test_modin_pd():
    import modin.pandas as pd
    from modin.config import Engine

    Engine.put("ray")  # Modin will use Ray
    start = time.time()
    data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
    print(data.shape)
    print(f"modin cost {time.time() - start}s")
    return data

if __name__ == '__main__':
    test_modin_pd()

wqh17101 avatar Mar 10 '22 23:03 wqh17101

oh , i check the func code where it raised exceptions. C:\Users\wqh\anaconda3\Lib\site-packages\ray_private\resource_spec.py

def _autodetect_num_gpus():
    """Attempt to detect the number of GPUs on this machine.

    TODO(rkn): This currently assumes NVIDIA GPUs on Linux.
    TODO(mehrdadn): This currently does not work on macOS.
    TODO(mehrdadn): Use a better mechanism for Windows.

    Possibly useful: tensorflow.config.list_physical_devices()

    Returns:
        The number of GPUs if any were detected, otherwise 0.
    """
    result = 0
    if sys.platform.startswith("linux"):
        proc_gpus_path = "/proc/driver/nvidia/gpus"
        if os.path.isdir(proc_gpus_path):
            result = len(os.listdir(proc_gpus_path))
    elif sys.platform == "win32":
        props = "AdapterCompatibility"
        cmdargs = ["WMIC", "PATH", "Win32_VideoController", "GET", props]
        lines = subprocess.check_output(cmdargs).splitlines()[1:]
        result = len([x.rstrip() for x in lines if x.startswith(b"NVIDIA")])
    return result

It seems like it want to run some cmd, i run it in command line and get WMIC is not a command in my system so is there anything i need to install first?

wqh17101 avatar Mar 10 '22 23:03 wqh17101

When i comment that branch it triggered to continue,then i get

PS G:\naic_csi_semi_final\task2> python .\test_read.py
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:

    import ray
    ray.init()

Traceback (most recent call last):
  File ".\test_read.py", line 53, in <module>
    test_modin_pd()
  File ".\test_read.py", line 30, in test_modin_pd
    data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
    return _read(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 60, in _read
    pd_obj = FactoryDispatcher.read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\dispatcher.py", line 180, in read_csv
    return cls.__factory._read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\factories.py", line 207, in _read_csv
    return cls.io_cls.read_csv(**kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\file_dispatcher.py", line 151, in read
    query_compiler = cls._read(*args, **kwargs)
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 1007, in _read
    splits = cls.partitioned_file(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 314, in partitioned_file
    outside_quotes = cls.offset(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 192, in offset
    outside_quotes, _ = cls._read_rows(
  File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 386, in _read_rows
    for line in iterator:
TypeError: 'LocalFileOpener' object is not iterable

wqh17101 avatar Mar 10 '22 23:03 wqh17101

Hi @wqh17101!

Can you reproduce TypeError: 'LocalFileOpener' object is not iterable problem with following reproducer?

import fsspec

file_path = r"./test.csv"
file = fsspec.open(file_path).open()

for line in file:
    print(line)

anmyachev avatar Mar 22 '22 21:03 anmyachev

For reference: problem with Ray is reported here.

anmyachev avatar Mar 22 '22 21:03 anmyachev

yeah the same error image

wqh17101 avatar Mar 23 '22 10:03 wqh17101

Thanks @wqh17101! This looks like a problem with Windows 11 support on fsspec side (which Modin use for reading files, just like pandas). Can you create an issue there?

anmyachev avatar Mar 23 '22 11:03 anmyachev

@anmyachev ffspec v0.7.4 not work v2022.5.0 work now so you should add a minimum version of ffspec

wqh17101 avatar Jul 25 '22 15:07 wqh17101

@wqh17101 thanks for working on that! We should think about upgrading minimum version of fsspec thinking about compatibility with pandas.

@jbrockmendel doesn't the problem affect pandas too? Maybe you faced with this?

anmyachev avatar Jul 27 '22 21:07 anmyachev

This doesn't look familiar, but I can confirm that pandas 1.5.0 (coming soon) bumps the fsspec minimum supported version to 2021.5.0

jbrockmendel avatar Jul 27 '22 22:07 jbrockmendel

thanks @jbrockmendel!

@wqh17101 could you check fsspec v2021.5.0 with Windows 11 (I haven't the OS)? If this works, then a future release of Modin that supports pandas 1.5.0 will also work for you.

anmyachev avatar Jul 27 '22 22:07 anmyachev

One piece is a bug in Ray, and another one is handled by https://github.com/modin-project/modin/issues/4855

I'm going to close this tracker in favour of other two which are more specific.

vnlitvinov avatar Aug 29 '22 22:08 vnlitvinov