function `load_dataset` can't solve folder path with regex characters like "[]"
Describe the bug
When using the load_dataset function with a folder path containing regex special characters (such as "[]"), the issue occurs due to how the path is handled in the resolve_pattern function. This function passes the unprocessed path directly to AbstractFileSystem.glob, which supports regular expressions. As a result, the globbing mechanism interprets these characters as regex patterns, leading to a traversal of the entire disk partition instead of confining the search to the intended directory.
Steps to reproduce the bug
just create a folder like E:\[D_DATA]\koch_test, then load_dataset("parquet", data_dir="E:\[D_DATA]\\test", split="train")
it will keep searching the whole disk.
I add two print in glob and resolve_pattern to see the path
Expected behavior
it should load the dataset as in normal folders
Environment info
-
datasetsversion: 3.3.2 - Platform: Windows-10-10.0.22631-SP0
- Python version: 3.10.16
-
huggingface_hubversion: 0.29.1 - PyArrow version: 19.0.1
- Pandas version: 2.2.3
-
fsspecversion: 2024.12.0
Hi ! Have you tried escaping the glob special characters [ and ] ?
btw note thatAbstractFileSystem.glob doesn't support regex, instead it supports glob patterns as in the python library glob