modin
modin copied to clipboard
Does this module support windows?
System information
-
OS Platform and Distribution:Win11
-
Modin version :0.13.2
-
Python version:3.8.8
-
Code we can use to reproduce: import modin.pandas as pd from modin.config import Engine
Engine.put("dask") Engine.put("ray")
data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
Describe the problem
Not work on Win11
Source code / logs
using dask:
UserWarning: Dask execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
from distributed import Client
client = Client()
Traceback (most recent call last):
File ".\test_read.py", line 45, in <module>
print((test_modin_pd() == test_pd()).all())
File ".\test_read.py", line 30, in test_modin_pd
data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 60, in _read
pd_obj = FactoryDispatcher.read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\dispatcher.py", line 180, in read_csv
return cls.__factory._read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\factories.py", line 207, in _read_csv
return cls.io_cls.read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\file_dispatcher.py", line 151, in read
query_compiler = cls._read(*args, **kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 1007, in _read
splits = cls.partitioned_file(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 314, in partitioned_file
outside_quotes = cls.offset(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 192, in offset
outside_quotes, _ = cls._read_rows(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 386, in _read_rows
for line in iterator:
TypeError: 'LocalFileOpener' object is not iterable
using ray:
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
Traceback (most recent call last):
File ".\test_read.py", line 45, in <module>
test_modin_pd()
File ".\test_read.py", line 30, in test_modin_pd
data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 56, in _read
Engine.subscribe(_update_engine)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\config\pubsub.py", line 213, in subscribe
callback(cls)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\__init__.py", line 128, in _update_engine
initialize_ray()
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\ray\common\utils.py", line 183, in initialize_ray
ray.init(**ray_init_kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\worker.py", line 918, in init
_global_node = ray.node.Node(
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 215, in __init__
self.start_head_processes()
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 911, in start_head_processes
self.start_redis()
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 715, in start_redis
self.get_resource_spec(),
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\node.py", line 344, in get_resource_spec
self._resource_spec = ResourceSpec(
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\resource_spec.py", line 149, in resolve
num_gpus = _autodetect_num_gpus()
File "C:\Users\wqh\anaconda3\lib\site-packages\ray\_private\resource_spec.py", line 241, in _autodetect_num_gpus
lines = subprocess.check_output(cmdargs).splitlines()[1:]
File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 489, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 854, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\wqh\anaconda3\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] can not find that file.
Hi @wqh17101 thanks for the report! Modin works on Windows.
What type of object did you pass as file_path
? It seems like that might be causing the issue.
hi, file_path="./x.csv", the same code can work on Linux @devin-petersohn
@wqh17101 is this a test file you can share? I have tried to reproduce this, but I haven't been able to.
Sure,it is a normal csv test_Y.csv
@wqh17101 can you share the full script? I wasn't able to reproduce the issue with your code and file. Are you doing os.chdir
or something like that? That could be contributing here.
sure, it is a common script.
#!/usr/bin/env python
# _*_coding:utf-8_*_
"""
@Time : 2022/3/5 22:20
@Author : Qinghua Wang
@Email : [email protected]
"""
import time
import pandas as pd
file_path = r"./test.csv"
def test_modin_pd():
import modin.pandas as pd
from modin.config import Engine
Engine.put("ray") # Modin will use Ray
start = time.time()
data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
print(data.shape)
print(f"modin cost {time.time() - start}s")
return data
if __name__ == '__main__':
test_modin_pd()
oh , i check the func code where it raised exceptions. C:\Users\wqh\anaconda3\Lib\site-packages\ray_private\resource_spec.py
def _autodetect_num_gpus():
"""Attempt to detect the number of GPUs on this machine.
TODO(rkn): This currently assumes NVIDIA GPUs on Linux.
TODO(mehrdadn): This currently does not work on macOS.
TODO(mehrdadn): Use a better mechanism for Windows.
Possibly useful: tensorflow.config.list_physical_devices()
Returns:
The number of GPUs if any were detected, otherwise 0.
"""
result = 0
if sys.platform.startswith("linux"):
proc_gpus_path = "/proc/driver/nvidia/gpus"
if os.path.isdir(proc_gpus_path):
result = len(os.listdir(proc_gpus_path))
elif sys.platform == "win32":
props = "AdapterCompatibility"
cmdargs = ["WMIC", "PATH", "Win32_VideoController", "GET", props]
lines = subprocess.check_output(cmdargs).splitlines()[1:]
result = len([x.rstrip() for x in lines if x.startswith(b"NVIDIA")])
return result
It seems like it want to run some cmd, i run it in command line and get
WMIC is not a command in my system
so is there anything i need to install first?
When i comment that branch it triggered to continue,then i get
PS G:\naic_csi_semi_final\task2> python .\test_read.py
UserWarning: Ray execution environment not yet initialized. Initializing...
To remove this warning, run the following python code before doing dataframe operations:
import ray
ray.init()
Traceback (most recent call last):
File ".\test_read.py", line 53, in <module>
test_modin_pd()
File ".\test_read.py", line 30, in test_modin_pd
data = pd.read_csv(file_path, dtype=np.float, header=None).to_numpy("float32").astype('float32')
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 135, in read_csv
return _read(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\pandas\io.py", line 60, in _read
pd_obj = FactoryDispatcher.read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\dispatcher.py", line 180, in read_csv
return cls.__factory._read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\execution\dispatching\factories\factories.py", line 207, in _read_csv
return cls.io_cls.read_csv(**kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\file_dispatcher.py", line 151, in read
query_compiler = cls._read(*args, **kwargs)
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 1007, in _read
splits = cls.partitioned_file(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 314, in partitioned_file
outside_quotes = cls.offset(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 192, in offset
outside_quotes, _ = cls._read_rows(
File "C:\Users\wqh\anaconda3\lib\site-packages\modin\core\io\text\text_file_dispatcher.py", line 386, in _read_rows
for line in iterator:
TypeError: 'LocalFileOpener' object is not iterable
Hi @wqh17101!
Can you reproduce TypeError: 'LocalFileOpener' object is not iterable
problem with following reproducer?
import fsspec
file_path = r"./test.csv"
file = fsspec.open(file_path).open()
for line in file:
print(line)
For reference: problem with Ray is reported here.
yeah the same error
Thanks @wqh17101! This looks like a problem with Windows 11 support on fsspec
side (which Modin
use for reading files, just like pandas
). Can you create an issue there?
@anmyachev ffspec v0.7.4 not work v2022.5.0 work now so you should add a minimum version of ffspec
@wqh17101 thanks for working on that! We should think about upgrading minimum version of fsspec
thinking about compatibility with pandas.
@jbrockmendel doesn't the problem affect pandas too? Maybe you faced with this?
This doesn't look familiar, but I can confirm that pandas 1.5.0 (coming soon) bumps the fsspec minimum supported version to 2021.5.0
thanks @jbrockmendel!
@wqh17101 could you check fsspec v2021.5.0 with Windows 11 (I haven't the OS)? If this works, then a future release of Modin that supports pandas 1.5.0 will also work for you.
One piece is a bug in Ray, and another one is handled by https://github.com/modin-project/modin/issues/4855
I'm going to close this tracker in favour of other two which are more specific.