GearNet
GearNet copied to clipboard
A dataset not found when I run "python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]"
It seems that the file located in "https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar" really doesn't exist. When I entered this url in my browser, it also noticed me that the file doesn't exist.
14:43:55 Downloading https://ftp.ebi.ac.uk/pub/databases/alphafold/latest/UP000006548_3702_ARATH_v2.tar to /home/horace/scratch/protein-datasets/alphafold/UP000006548_3702_ARATH_v2.tar
Traceback (most recent call last):
File "script/pretrain.py", line 50, in <module>
dataset = core.Configurable.load_config_dict(cfg.dataset)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 269, in load_config_dict
return cls(**new_config)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/core/core.py", line 288, in wrapper
return init(self, *args, **kwargs)
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/datasets/alphafolddb.py", line 122, in __init__
tar_file = utils.download(self.urls[species_id], path, md5=self.md5s[species_id])
File "/home/horace/.conda/envs/drug/lib/python3.7/site-packages/torchdrug/utils/file.py", line 31, in download
urlretrieve(url, save_file)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/horace/.conda/envs/drug/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Oh, I found that on the web the dataset's version turns to v4 instead of v2. So If I just used v4 dataset, will it have an effect on the experiments? Addtionally, how did I use v4?
I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.
I think it's okay to use v4 instead of v2. The pre-training dataset doesn't have a large effect on the final performance.
I have downloaded the v4 dataset and put it into the correct directory. However, when I tried to run the command python script/pretrain.py -c config/pretrain/mc_gearnet_edge.yaml --gpus [0]
, the program still started to download the v2 dataset. I don't know how to deal with this condition.
Sorry for the inconvience! This is because I set the default files as v2 datasets instead of v4 datasets. The easiest way to change this is to inherit the datasets.AlphaFoldDB
class and rewrite the urls
and md5s
attributes here. The class will check the downloaded files according to filenames in urls
and check the md5
values.
I think this url issue is resolved in the updated version(0.2.1)
Installing the updated torchdrug fixed this
Use: pip install torchdrug==0.2.1