cdvae
cdvae copied to clipboard
Code Hanging At Start of Training
Hi, I am running the CDVAE carbon experiment and I have been seeing a weird error. It appears that my code will just hang after completely three iterations of the first epoch.
I run **python cdvae/run.py data=carbon expname=carbon model.predict_property=True**
The output I see is this:
`[2023-07-13 16:57:36,190][hydra.utils][INFO] - Instantiating <cdvae.pl_data.datamodule.CrystDataModule>
[2023-07-13 16:57:37,161][numexpr.utils][INFO] - Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
[2023-07-13 16:57:37,161][numexpr.utils][INFO] - Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
25%|█████████████████████▍ | 1521/6091 [00:25<01:29, 50.81it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
34%|█████████████████████████████▎ | 2080/6091 [00:34<01:05, 61.51it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
46%|███████████████████████████████████████▊ | 2820/6091 [00:46<01:02, 52.70it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
50%|██████████████████████████████████████████▋ | 3021/6091 [00:49<00:52, 58.26it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
51%|███████████████████████████████████████████▍ | 3079/6091 [00:50<00:54, 55.41it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
51%|████████████████████████████████████████████▏ | 3132/6091 [00:51<00:40, 72.78it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
52%|████████████████████████████████████████████▎ | 3140/6091 [00:51<00:49, 59.77it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
60%|███████████████████████████████████████████████████▊ | 3673/6091 [00:59<00:38, 63.39it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
66%|████████████████████████████████████████████████████████▋ | 4018/6091 [01:05<00:32, 63.75it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
67%|█████████████████████████████████████████████████████████▌ | 4077/6091 [01:06<00:33, 60.74it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
67%|█████████████████████████████████████████████████████████▊ | 4098/6091 [01:06<00:29, 67.92it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
69%|███████████████████████████████████████████████████████████▊ | 4233/6091 [01:08<00:29, 63.53it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
74%|████████████████████████████████████████████████████████████████ | 4536/6091 [01:13<00:23, 67.50it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
80%|████████████████████████████████████████████████████████████████████▋ | 4869/6091 [01:18<00:16, 72.17it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
84%|████████████████████████████████████████████████████████████████████████ | 5106/6091 [01:22<00:18, 53.65it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
91%|██████████████████████████████████████████████████████████████████████████████▌ | 5566/6091 [01:29<00:08, 63.96it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
95%|█████████████████████████████████████████████████████████████████████████████████▋ | 5786/6091 [01:33<00:05, 59.95it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
96%|██████████████████████████████████████████████████████████████████████████████████▏ | 5822/6091 [01:33<00:04, 64.72it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
98%|████████████████████████████████████████████████████████████████████████████████████▎ | 5974/6091 [01:36<00:01, 66.91it/s]/home/.conda/envs/cdvae/lib/python3.8/site-packages/pymatgen/io/cif.py:1120: UserWarning: Issues encountered while parsing CIF: Some fractional coordinates rounded to ideal values to avoid issues with finite precision.
warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
100%|██████████████████████████████████████████████████████████████████████████████████████| 6091/6091 [01:39<00:00, 61.48it/s]
/gpfs/fs1/home/cdvae-old/cdvae/cdvae/common/data_utils.py:644: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /data/miniconda3/envs/opence-1.7/conda-bld/pytorch-base_1663986328871/work/torch/csrc/utils/tensor_new.cpp:201.)
targets = torch.tensor([d[key] for d in data_list])
/gpfs/fs1/home/cdvae-old/cdvae/cdvae/common/data_utils.py:612: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
X = torch.tensor(X, dtype=torch.float)
[2023-07-13 16:59:20,540][hydra.utils][INFO] - Instantiating <cdvae.pl_modules.model.CDVAE>
[2023-07-13 16:59:20,615][torch.distributed.nn.jit.instantiator][INFO] - Created a temporary directory at /tmp/tmpwv1glt9u
[2023-07-13 16:59:20,615][torch.distributed.nn.jit.instantiator][INFO] - Writing /tmp/tmpwv1glt9u/_remote_module_non_scriptable.py
[2023-07-13 16:59:53,346][hydra.utils][INFO] - Passing scaler from datamodule to model <StandardScalerTorch(means: -154.2510223388672, stds: 0.13738815486431122)>
[2023-07-13 16:59:53,348][hydra.utils][INFO] - Adding callback <LearningRateMonitor>
[2023-07-13 16:59:53,349][hydra.utils][INFO] - Adding callback <EarlyStopping>
[2023-07-13 16:59:53,350][hydra.utils][INFO] - Adding callback <ModelCheckpoint>
[2023-07-13 16:59:53,354][hydra.utils][INFO] - Instantiating <WandbLogger>
wandb: Currently logged in as: _. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/cdvae-old/cdvae/wabdb/wandb/run-20230713_165954-u04zv43g
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run carbon
wandb: ⭐️ View project at https://wandb.ai/_/crystal_generation_mit
wandb: 🚀 View run at https://wandb.ai/_/crystal_generation_mit/runs/u04zv43g
[2023-07-13 17:00:07,550][hydra.utils][INFO] - W&B is now watching <{cfg.logging.wandb_watch.log}>!
wandb: logging graph, to disable use `wandb.watch(log_graph=False)`
[2023-07-13 17:00:07,588][hydra.utils][INFO] - Instantiating the Trainer
/home/.conda/envs/cdvae/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py:96: LightningDeprecationWarning: Setting `Trainer(progress_bar_refresh_rate=20)` is deprecated in v1.5 and will be removedin v1.7. Please pass `pytorch_lightning.callbacks.progress.TQDMProgressBar` with `refresh_rate` directly to the Trainer's `callbacks` argument instead. Or, to disable the progress bar pass `enable_progress_bar = False` to the Trainer.
rank_zero_deprecation(
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
[2023-07-13 17:00:07,650][hydra.utils][INFO] - Starting training!
0%| | 2/6091 [00:00<32:19, 3.14it/s
I am running on MIST HPC, so I have turned off WandB logging.
Environment
Package Version Editable project location
------------------------ ----------------- --------------------------------------------------
absl-py 1.4.0
aiofiles 22.1.0
aiohttp 3.8.4
aiosignal 1.3.1
aiosqlite 0.18.0
altair 5.0.1
antlr4-python3-runtime 4.8
anyio 3.5.0
appdirs 1.4.4
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
ase 3.22.0
astor 0.8.1
astroid 2.14.2
asttokens 2.0.5
async-timeout 4.0.2
attrs 22.1.0
autopep8 2.0.2
av 9.2.0
Babel 2.11.0
backcall 0.2.0
backports.zoneinfo 0.2.1
base58 2.1.1
beautifulsoup4 4.12.2
bleach 4.1.0
blinker 1.6.2
Bottleneck 1.3.5
brotlipy 0.7.0
cachetools 5.3.1
cdvae 0.0.1
certifi 2023.5.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.0.4
colorama 0.4.6
comm 0.1.2
configparser 6.0.0
contourpy 1.0.5
coverage 7.2.2
cryptography 39.0.1
cycler 0.11.0
debugpy 1.5.1
decorator 5.1.1
defusedxml 0.7.1
dill 0.3.6
distlib 0.3.6
dnspython 2.3.0
docker-pycreds 0.4.0
emmet-core 0.60.1
entrypoints 0.4
exceptiongroup 1.0.4
executing 0.8.3
fastjsonschema 2.16.2
filelock 3.12.0
fonttools 4.25.0
frozenlist 1.3.3
fsspec 2023.4.0
future 0.18.3
gitdb 4.0.10
GitPython 3.1.32
google-auth 2.22.0
google-auth-oauthlib 1.0.0
googledrivedownloader 0.4
grpcio 1.48.2
higher 0.2.1
html5lib 1.1
hydra-core 1.1.0
hydra-joblib-launcher 1.1.5
idna 3.4
importlib-metadata 6.0.0
importlib-resources 5.12.0
iniconfig 1.1.1
ipykernel 6.19.2
ipython 8.12.0
ipython-genutils 0.2.0
ipywidgets 8.0.4
isodate 0.6.1
isort 5.9.3
jedi 0.18.1
Jinja2 3.1.2
joblib 1.2.0
json5 0.9.6
jsonschema 4.17.3
jupyter_client 8.1.0
jupyter_core 5.3.0
jupyter-events 0.6.3
jupyter_server 2.5.0
jupyter_server_fileid 0.9.0
jupyter_server_terminals 0.4.4
jupyter_server_ydoc 0.8.0
jupyter-ydoc 0.2.4
jupyterlab 3.6.3
jupyterlab-pygments 0.1.2
jupyterlab_server 2.22.0
jupyterlab-widgets 3.0.5
kiwisolver 1.4.4
latexcodec 2.0.1
lazy-object-proxy 1.6.0
lightning-utilities 0.7.1
lxml 4.9.2
Markdown 3.4.3
MarkupSafe 2.1.1
matminer 0.7.3
matplotlib 3.7.1
matplotlib-inline 0.1.6
mccabe 0.7.0
mistune 0.8.4
monty 2023.5.8
mp-api 0.33.3
mpmath 1.3.0
msgpack 1.0.5
multidict 6.0.4
multiprocess 0.70.14
munkres 1.1.4
nbclassic 0.5.5
nbclient 0.5.13
nbconvert 6.5.4
nbformat 5.7.0
nest-asyncio 1.5.6
networkx 2.8.4
nglview 3.0.6
notebook 6.5.4
notebook_shim 0.2.2
numexpr 2.8.4
numpy 1.23.5
oauthlib 3.2.2
omegaconf 2.1.2
p-tqdm 1.3.3
packaging 23.0
palettable 3.3.3
pandas 1.5.3
pandocfilters 1.5.0
parso 0.8.3
pathos 0.3.0
pathtools 0.1.2
pexpect 4.8.0
pickleshare 0.7.5
Pillow 9.4.0
Pint 0.21.1
pip 23.1.2
pkgutil_resolve_name 1.3.10
platformdirs 3.2.0
plotly 5.15.0
pluggy 1.0.0
pox 0.3.2
ppft 1.7.6.6
prometheus-client 0.14.1
promise 2.3
prompt-toolkit 3.0.36
protobuf 3.19.6
psutil 5.9.0
ptyprocess 0.7.0
pure-eval 0.2.2
py 1.11.0
pyarrow 8.0.0
pyasn1 0.5.0
pyasn1-modules 0.3.0
pybtex 0.24.0
pycodestyle 2.10.0
pycparser 2.21
pydantic 1.10.11
pydeck 0.8.1b0
pyDeprecate 0.3.1
pyg-nightly 2.4.0.dev20230711
Pygments 2.15.1
pylint 2.16.2
pymatgen 2023.7.11
pymongo 4.4.0
pyOpenSSL 23.0.0
pyparsing 3.0.9
pyrsistent 0.18.0
PySocks 1.7.1
pytest 7.3.1
pytest-cov 4.0.0
python-dateutil 2.8.2
python-dotenv 1.0.0
python-json-logger 2.0.7
python-louvain 0.15
pytorch-lightning 1.6.5
pytz 2022.7
PyYAML 5.4.1
pyzmq 25.1.0
rdflib 6.1.1
requests 2.29.0
requests-oauthlib 1.3.1
rfc3339-validator 0.1.4
rfc3986-validator 0.1.1
rsa 4.9
ruamel.yaml 0.17.32
ruamel.yaml.clib 0.2.7
scikit-learn 1.2.2
scipy 1.8.1
Send2Trash 1.8.0
sentencepiece 0.1.96
sentry-sdk 1.28.0
setproctitle 1.3.2
setuptools 67.8.0
shortuuid 1.0.11
six 1.16.0
SMACT 2.2.1
smmap 5.0.0
sniffio 1.2.0
soupsieve 2.4
spglib 2.0.2
stack-data 0.2.0
streamlit 0.79.0
subprocess32 3.5.4
sympy 1.12
tabulate 0.8.10
tenacity 8.2.2
tensorboard 2.13.0
tensorboard-data-server 0.7.1
terminado 0.17.1
threadpoolctl 2.2.0
tinycss2 1.2.1
toml 0.10.2
tomli 2.0.1
tomlkit 0.11.1
toolz 0.12.0
torch 1.12.1
torch-cluster 1.6.1
torch-geometric 1.7.2
torch-scatter 2.0.8
torch-sparse 0.6.10
torch-spline-conv 1.2.2
torchdiffeq 0.0.1
torchmetrics 1.0.0
torchtext 0.13.1a0+35066f2
torchvision 0.13.1
tornado 6.2
tqdm 4.65.0
traitlets 5.7.1
typing_extensions 4.6.3
tzlocal 5.0.1
uncertainties 3.1.7
urllib3 1.26.16
validators 0.20.0
virtualenv 20.22.0
wandb 0.15.5
watchdog 3.0.0
wcwidth 0.2.5
webencodings 0.5.1
websocket-client 0.58.0
Werkzeug 2.3.6
wheel 0.38.4
widgetsnbextension 4.0.5
wrapt 1.14.1
y-py 0.5.9
yacs 0.1.6
yarl 1.9.2
ypy-websocket 0.8.2
zipp 3.11.0
Any suggestions on how to resolve this? I am not very familiar with Hydra and Pytorch Lightning.