pytorch_geometric
pytorch_geometric copied to clipboard
download QM9 dataset
🐛 Describe the bug
code:
from torch_geometric.datasets import QM9
path = './dataset/QM9'
dataset = QM9(path)
bug: Downloading https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/molnet_publish/qm9.zip Extracting dataset\QM9\raw\qm9.zip Downloading https://ndownloader.figshare.com/files/3195404 Traceback (most recent call last): File "D:\anaconda\envs\py37\lib\urllib\request.py", line 1350, in do_open encode_chunked=req.has_header('Transfer-encoding')) File "D:\anaconda\envs\py37\lib\http\client.py", line 1281, in request self._send_request(method, url, body, headers, encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1327, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1276, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "D:\anaconda\envs\py37\lib\http\client.py", line 1036, in _send_output self.send(msg) File "D:\anaconda\envs\py37\lib\http\client.py", line 976, in send self.connect() File "D:\anaconda\envs\py37\lib\http\client.py", line 1443, in connect super().connect() File "D:\anaconda\envs\py37\lib\http\client.py", line 948, in connect (self.host,self.port), self.timeout, self.source_address) File "D:\anaconda\envs\py37\lib\socket.py", line 707, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): File "D:\anaconda\envs\py37\lib\socket.py", line 752, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno 11001] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/Remtan/Desktop/postgraduated/GNN/test.py", line 6, in
Process finished with exit code 1
maybe i cannot get in https://ndownloader.figshare.com/files/3195404
I really want to solve this bug,
and i also want to know what rdkit
do in QM9 dataset.
Thanks!
Environment
- PyG version: 2.0.4
- PyTorch version: 1.11.0
- OS: Win11
- Python version: 3.7.13
- CUDA/cuDNN version: cu11.3 cuDNN8.0
- How you installed PyTorch and PyG (
conda
,pip
, source): conda - Any other relevant information (e.g., version of
torch-scatter
): - rdkit: 2020.09.1.0
This is weird, downloading works for me. Does manually downloading https://ndownloader.figshare.com/files/3195404 work? You could then just move this file to ./dataset/QM9/raw/uncharacterized.txt
.
thanks for replying so fast.
i can't get in https://ndownloader.figshare.com/files/3195404.
and i also want to know what rdkit
do to QM9 dataset.
------------------ Original message ------------------ From: "Matthias Fey"; Sendtime: Monday, Jun 6, 2022 4:01 PM To: "pyg-team/pytorch_geometric"; Cc: @.***>; "Author"; Subject: Re: [pyg-team/pytorch_geometric] download QM9 dataset (Issue #4770)
This is weird, downloading works for me. Does manually downloading https://ndownloader.figshare.com/files/3195404 works? You could then just move this file to ./dataset/QM9/raw/uncharacterized.txt.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
If rdkit
is installed, we use it to process the raw data. Otherwise, we will load the pre-processed data directly. If you cannot access/download the raw files, you can consider to temporarily disable rdkit
such that the pre-processed data is used.
i check the raw data of QM9 from the file named gdb9.sdf
, find that each node feature has three coordinates.
So how to adds them to each node feature.
Thanks!
The coordinates are already present in data.pos
. You can add them to the node features via
x = torch.cat([data.x, data.pos], dim=-1)