modin icon indicating copy to clipboard operation
modin copied to clipboard

BUG: The csv file cannot be read if there are square brackets in the csv file path or full path.

Open JacobKwon opened this issue 1 year ago • 4 comments

Modin version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest released version of Modin.

  • [ ] I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

df = pd.read_csv("/home/user/[DE]/[PROJECT]/total_data_231127.csv",
    names=['seq', 'A', 'B'],
    usecols=['A','B'],
    sep="\t",
    dtype=str,
    na_values=['\\N'],
 )

Issue Description

I thought it was a Korean (UTF-8) problem at first, so I tested it several times to find the error. (Because, I am using the Korean version of Ubuntu 22.04 and Korean is included in the file path.)

After removing Korean, I tried to read_csv using the file full path, but the same error occurred. Below is the test screen using square brackets. (Using "MODIN_ENGINE" = "dask")

modin1

modin2

I tried these tests just in case, but it was no use.

modin3

modin4

Currently, I'm using it very well after removing square brackets. However I think it would be better to leave such a report, so I'm writing in a bug report. Thank you.

Expected Behavior

FileNotFoundError: [Errno 2] No such file or directory: '/home/user/[DE]/[PROJECT]/total_data_231127.csv'

Error Logs


Replace this line with the error backtrace (if applicable).

Installed Versions

  • Ubuntu 22.04.2 LTS (Korean)

  • conda 22.9.0

  • jupyter notebook 6.5.3

  • pip list print ... dask 2023.5.0 modin 0.23.1.post0 modin-spreadsheet 0.1.2 ...

JacobKwon avatar Dec 15 '23 01:12 JacobKwon

cc @anmyachev

YarShev avatar Dec 15 '23 11:12 YarShev

Hello @JacobKwon! Thanks for your contribution and sorry for the long response.

First of all, I would like to clarify if square brackets work if you are using pandas and not modin? (you may have already tried)

anmyachev avatar Dec 19 '23 15:12 anmyachev

Hello, @anmyachev It's okay to have a late response time. I know the time difference is considerable. 😉

First, As you thought, I tested the pandas read_csv. However, I tested it to attach the image once again after seeing your answer, and I will respond by attaching the image. 👍

  • Using Pandas 1

  • Using Modin 2

~~Actually, it's my first bug report on Github, so I'm worried that I might have made a mistake. 🙄~~ Thank you.

JacobKwon avatar Dec 20 '23 01:12 JacobKwon

@JacobKwon I can reproduce the problem and seem to have found the cause. Modin uses fsspec library in cases when pandas doesn't: https://github.com/fsspec/filesystem_spec/issues/1476

anmyachev avatar Dec 20 '23 22:12 anmyachev