moabb icon indicating copy to clipboard operation
moabb copied to clipboard

Error downloading dataset Sosulski2019

Open AlexandreBleuze opened this issue 3 years ago • 4 comments
trafficstars

Hi, I'm trying to download dataset Sosulski2019 for my research but when I do so, I get the following error :

Traceback (most recent call last):
  File "C:\Program Files\Python39\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\base.py", line 169, in download
    self.data_path(
  File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\sosulski2019.py", line 163, in data_path
    path_zip = dl.data_dl(url, "spot")
  File "<decorator-gen-579>", line 24, in data_dl
  File "\\filesrv4\home$\bleuzea\.windows\Application Data\Python\Python39\site-packages\moabb\datasets\download.py", line 144, in data_dl
    os.makedirs(osp.dirname(destination))
  File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Program Files\Python39\lib\os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "C:\Program Files\Python39\lib\os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [WinError 123] La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte: 'C:\\Users\\bleuzea\\mne_data\\MNE-spot-data\\:'

I think the error comes from the name of the downloaded data because the file "MNE-spot-data" exists at the place mentionned in the error but the ":' thing seems weird, it's perhaps just a typo in the code where the subject number should appear and the filename should be concatenated with the folder's name ?

Once you have deleted the data of the database, a simple code to get the error is :

from moabb.datasets import Sosulski2019
Sosulski2019().download()

AlexandreBleuze avatar Jun 09 '22 09:06 AlexandreBleuze

I found the error and a possible correction. The error is comming from the fact that the url for download is 'https://freidok.uni-freiburg.de/fedora/objects/freidok:154576/datastreams/FILE4/content'. Therefore, the url translation from mne gives a local path that looks like ".../mne_data/MNE-spot-data/:/154576/etc...". The errors happens when the function tries to create the folder named ":" which is a forbidden character for folders. The best possible correction I found was to modify file "moabb\datasets\download.py" and add

    destination = path+destination.split(path)[1].translate({ord(c): None for c in ':*?"<>|'})

at line 138 just after

    destination = _url_to_local_path(url, osp.join(path, key_dest))

It's a bit annoying to deal with as the problem comes from a ":" when a path usually starts with "C:" for example (and we don't want to change that). This correction works well and will normally correct other errors like this one but can create errors since it modifies a file used by all dataset download (even though it souldn't). If you have a better idea, you can change it.

AlexandreBleuze avatar Jun 09 '22 15:06 AlexandreBleuze

I tried to change MNE setup to 1.0.3 instead of 0.24.1 and this correction is not working anymore because the "path" is not exactly an str anymore. It's a path type comming from MNE. However it works enough like a str so that it is possible to change "path" to "str(path)" in the correction I gave and make it work. It gives :

    destination = str(path)+destination.split(str(path))[1].translate({ord(c): None for c in ':*?"<>|'})

AlexandreBleuze avatar Jun 10 '22 08:06 AlexandreBleuze

Thank you for the in-depth analysis of this bug. I guess this slipped under my radar when implementing the dataset wrapper, as I work with Linux, where : is a valid character in folder/file-names.

@sylvchev we could change the current implementation, such that illegal characters (illegal on at least one OS) in the URL are replaced by e.g. _ automatically. However, users that update to the newer version will then have to re-download these datasets. Alternatively, we can do this OS-specific, which might be more tedious to maintain.

jsosulski avatar Jun 29 '22 06:06 jsosulski

I don't think that having an OS-specific version is good idea for the maintenance reason you gave. However if we want to avoid useless re-dowloading of the data, it is possible to add a few line such as :

if os.path.exists(destination):
   os.rename(destination, new_destination)

or something like that, with new_destination being the destination variable that has been automatically replaced as you proposed.

AlexandreBleuze avatar Jul 04 '22 13:07 AlexandreBleuze

Thank you for getting in the bottom of this. I tried to find automated lib to handle windows/linux path, but I could not find something correct. Your solution is efficient and I implemented it in the download.py file in #318. Unfortunately, the datasets already downloaded need to be moved or redownloaded if their pathname change. This is only the case for Sosulski2019 dataset. But your code allows to support all OS, so this is useful.

sylvchev avatar Jan 04 '23 00:01 sylvchev