moabb
moabb copied to clipboard
Error downloading dataset Sosulski2019
Hi, I'm trying to download dataset Sosulski2019 for my research but when I do so, I get the following error :
Traceback (most recent call last):
File "C:\Program Files\Python39\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\base.py", line 169, in download
self.data_path(
File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\sosulski2019.py", line 163, in data_path
path_zip = dl.data_dl(url, "spot")
File "<decorator-gen-579>", line 24, in data_dl
File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\download.py", line 144, in data_dl
os.makedirs(osp.dirname(destination))
File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "C:\Program Files\Python39\lib\os.py", line 225, in makedirs
mkdir(name, mode)
OSError: [WinError 123] La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte: 'C:\\Users\\bleuzea\\mne_data\\MNE-spot-data\\:'
I think the error comes from the name of the downloaded data because the file "MNE-spot-data" exists at the place mentionned in the error but the ":' thing seems weird, it's perhaps just a typo in the code where the subject number should appear and the filename should be concatenated with the folder's name ?
Once you have deleted the data of the database, a simple code to get the error is :
from moabb.datasets import Sosulski2019
Sosulski2019().download()
I found the error and a possible correction. The error is comming from the fact that the url for download is 'https://freidok.uni-freiburg.de/fedora/objects/freidok:154576/datastreams/FILE4/content'. Therefore, the url translation from mne gives a local path that looks like ".../mne_data/MNE-spot-data/:/154576/etc...". The errors happens when the function tries to create the folder named ":" which is a forbidden character for folders. The best possible correction I found was to modify file "moabb\datasets\download.py" and add
destination = path+destination.split(path)[1].translate({ord(c): None for c in ':*?"<>|'})
at line 138 just after
destination = _url_to_local_path(url, osp.join(path, key_dest))
It's a bit annoying to deal with as the problem comes from a ":" when a path usually starts with "C:" for example (and we don't want to change that). This correction works well and will normally correct other errors like this one but can create errors since it modifies a file used by all dataset download (even though it souldn't). If you have a better idea, you can change it.
I tried to change MNE setup to 1.0.3 instead of 0.24.1 and this correction is not working anymore because the "path" is not exactly an str anymore. It's a path type comming from MNE. However it works enough like a str so that it is possible to change "path" to "str(path)" in the correction I gave and make it work. It gives :
destination = str(path)+destination.split(str(path))[1].translate({ord(c): None for c in ':*?"<>|'})
Thank you for the in-depth analysis of this bug. I guess this slipped under my radar when implementing the dataset wrapper, as I work with Linux, where : is a valid character in folder/file-names.
@sylvchev we could change the current implementation, such that illegal characters (illegal on at least one OS) in the URL are replaced by e.g. _ automatically. However, users that update to the newer version will then have to re-download these datasets. Alternatively, we can do this OS-specific, which might be more tedious to maintain.
I don't think that having an OS-specific version is good idea for the maintenance reason you gave. However if we want to avoid useless re-dowloading of the data, it is possible to add a few line such as :
if os.path.exists(destination):
os.rename(destination, new_destination)
or something like that, with new_destination being the destination variable that has been automatically replaced as you proposed.
Thank you for getting in the bottom of this. I tried to find automated lib to handle windows/linux path, but I could not find something correct. Your solution is efficient and I implemented it in the download.py file in #318. Unfortunately, the datasets already downloaded need to be moved or redownloaded if their pathname change. This is only the case for Sosulski2019 dataset. But your code allows to support all OS, so this is useful.