pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

download QM9 dataset

Open Powerd0g opened this issue 2 years ago • 5 comments

🐛 Describe the bug

code: from torch_geometric.datasets import QM9 path = './dataset/QM9' dataset = QM9(path)

bug: Downloading https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/molnet_publish/qm9.zip Extracting dataset\QM9\raw\qm9.zip Downloading https://ndownloader.figshare.com/files/3195404 Traceback (most recent call last): File "D:\anaconda\envs\py37\lib\urllib\request.py", line 1350, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "D:\anaconda\envs\py37\lib\http\client.py", line 1281, in request self._send_request(method, url, body, headers, encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1327, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1276, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1036, in _send_output self.send(msg) File "D:\anaconda\envs\py37\lib\http\client.py", line 976, in send self.connect() File "D:\anaconda\envs\py37\lib\http\client.py", line 1443, in connect super().connect() File "D:\anaconda\envs\py37\lib\http\client.py", line 948, in connect (self.host,self.port), self.timeout, self.source_address) File "D:\anaconda\envs\py37\lib\socket.py", line 707, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "D:\anaconda\envs\py37\lib\socket.py", line 752, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:/Remtan/Desktop/postgraduated/GNN/test.py", line 6, in dataset = QM9(path) File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\datasets\qm9.py", line 135, in init super().init(root, transform, pre_transform, pre_filter) File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\data\in_memory_dataset.py", line 56, in init super().init(root, transform, pre_transform, pre_filter) File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\data\dataset.py", line 84, in init self._download() File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\data\dataset.py", line 145, in _download self.download() File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\datasets\qm9.py", line 172, in download file_path = download_url(self.raw_url2, self.raw_dir) File "D:\anaconda\envs\py37\lib\site-packages\torch_geometric\data\download.py", line 34, in download_url data = urllib.request.urlopen(url, context=context) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 222, in urlopen return opener.open(url, data, timeout) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 525, in open response = self._open(req, data) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 543, in _open '_open', req) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 503, in _call_chain result = func(*args) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 1393, in https_open context=self._context, check_hostname=self._check_hostname) File "D:\anaconda\envs\py37\lib\urllib\request.py", line 1352, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

Process finished with exit code 1

maybe i cannot get in https://ndownloader.figshare.com/files/3195404 I really want to solve this bug, and i also want to know what rdkit do in QM9 dataset. Thanks!

Environment

  • PyG version: 2.0.4
  • PyTorch version: 1.11.0
  • OS: Win11
  • Python version: 3.7.13
  • CUDA/cuDNN version: cu11.3 cuDNN8.0
  • How you installed PyTorch and PyG (conda, pip, source): conda
  • Any other relevant information (e.g., version of torch-scatter):
  • rdkit: 2020.09.1.0

Powerd0g avatar Jun 06 '22 07:06 Powerd0g

This is weird, downloading works for me. Does manually downloading https://ndownloader.figshare.com/files/3195404 work? You could then just move this file to ./dataset/QM9/raw/uncharacterized.txt.

rusty1s avatar Jun 06 '22 08:06 rusty1s

thanks for replying so fast. i can't get in https://ndownloader.figshare.com/files/3195404. and i also want to know what rdkit do to QM9 dataset.

------------------ Original message ------------------ From: "Matthias Fey"; Sendtime: Monday, Jun 6, 2022 4:01 PM To: "pyg-team/pytorch_geometric"; Cc: @.***>; "Author"; Subject: Re: [pyg-team/pytorch_geometric] download QM9 dataset (Issue #4770)

This is weird, downloading works for me. Does manually downloading https://ndownloader.figshare.com/files/3195404 works? You could then just move this file to ./dataset/QM9/raw/uncharacterized.txt.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

Powerd0g avatar Jun 06 '22 08:06 Powerd0g

If rdkit is installed, we use it to process the raw data. Otherwise, we will load the pre-processed data directly. If you cannot access/download the raw files, you can consider to temporarily disable rdkit such that the pre-processed data is used.

rusty1s avatar Jun 06 '22 08:06 rusty1s

i check the raw data of QM9 from the file named gdb9.sdf, find that each node feature has three coordinates. So how to adds them to each node feature. Thanks!

Powerd0g avatar Jun 08 '22 14:06 Powerd0g

The coordinates are already present in data.pos. You can add them to the node features via

x = torch.cat([data.x, data.pos], dim=-1)

rusty1s avatar Jun 09 '22 09:06 rusty1s