cgnn icon indicating copy to clipboard operation
cgnn copied to clipboard

Can not configure environment for qmpy to access mysql database

Open huzongxiang opened this issue 2 years ago • 15 comments

pip and conda can not search pymatgen=2018.12.12 other versions installed through conda can not access MySQL oqmd database, running your oqmd script has bugs. How do I solve these problems?

conda create --name qmpy conda install -n qmpy scikit-learn matplotlib python=2.7 source activate qmpy pip install pymatgen==2018.12.12 monty==1.0.3 pip install qmpy==1.2.0 ase==3.17 pip install pydash tqdm joblib

huzongxiang avatar May 10 '22 11:05 huzongxiang

I installed a Miniconda (https://repo.anaconda.com/miniconda/Miniconda3-py37_4.11.0-Linux-x86_64.sh) on a clean Linux machine (actually, I used Colab), and then the qmpy environment was successfully created. I didn't encounter any installation problem.

Tony-Y avatar May 10 '22 15:05 Tony-Y

I installed a Miniconda (https://repo.anaconda.com/miniconda/Miniconda3-py37_4.11.0-Linux-x86_64.sh) on a clean Linux machine (actually, I used Colab), and then the qmpy environment was successfully created. I didn't encounter any installation problem.

Bugs occur when using pip install pymatgen,many sources of conda can not find pymatgen=2018.12.12. I want to use conda instead of pip but I have no idea which version of pymatgen I can use.

huzongxiang avatar May 10 '22 16:05 huzongxiang

Could you please look at the installation log. Pymatgen 2018.12.12 was successfully installed by pip on a clean Linux machine.

Tony-Y avatar May 11 '22 01:05 Tony-Y

Could you please look at the installation log. Pymatgen 2018.12.12 was successfully installed by pip on a clean Linux machine.

hello,I install the pymatgen in a pure linux ubuntu. When starting a pip installation,the ubuntu environment has no gcc for libs like spglib. The mysqlclient should also be configured for qmpy installation. Now, I have installed all libs successfully.

huzongxiang avatar May 11 '22 02:05 huzongxiang

Could you please look at the installation log. Pymatgen 2018.12.12 was successfully installed by pip on a clean Linux machine.

hello,I install the pymatgen in a pure linux ubuntu. When starting a pip installation,the ubuntu environment has no gcc for libs like spglib. The mysqlclient should also be configured for qmpy installation. Now, I have installed all libs successfully.

other problems occur when using qmpy ::sad

huzongxiang avatar May 11 '22 02:05 huzongxiang

Could you please look at the installation log. Pymatgen 2018.12.12 was successfully installed by pip on a clean Linux machine.

I will try it in the colab.

huzongxiang avatar May 11 '22 02:05 huzongxiang

For Ubuntu 20.04 or later, MySQL 8.0 is default. But, the OQMD needs a MySQL 5.7 server. Please check your mysql version. If you use MySQL 8.0, see an instruction to install MySQL 5.7 on Ubuntu.

Tony-Y avatar May 11 '22 03:05 Tony-Y

For Ubuntu 20.04 or later, MySQL 8.0 is default. But, the OQMD needs a MySQL 5.7 server. Please check your mysql version. If you use MySQL 8.0, see an instruction to install MySQL 5.7 on Ubuntu.

Thanks. Now, I can retrieve entries from mysql database successfully. I would like to know the specific format saved in .npz of the transformation by using script oqmd_data.py. Is a entry in CIF, poscar or pymatgen.Structure format?

('Total Materials:', ) 0%| | 0/914 [00:00<?, ?it/s] 1%|▍ | 10/914 [13:05<18:16:11, 72.76s/it]

huzongxiang avatar May 12 '22 02:05 huzongxiang

I would like to know the specific format saved in .npz of the transformation by using script oqmd_data.py. Is a entry in CIF, poscar or pymatgen.Structure format?

The data format is almost the same as one of Materials Project REST API. So, the entry's structure is a pymatgen.Structure object.

Tony-Y avatar May 12 '22 02:05 Tony-Y

I would like to know the specific format saved in .npz of the transformation by using script oqmd_data.py. Is a entry in CIF, poscar or pymatgen.Structure format?

The data format is almost the same as one of Materials Project REST API. So, the entry's structure is a pymatgen.Structure object.

Thanks. I found that oqmd_data.py can consume large memory. Only 21% entries consume 6.9G/8G memory. Will the process terminate when reaching maximum of memory of the device?

huzongxiang avatar May 12 '22 06:05 huzongxiang

I would like to know the specific format saved in .npz of the transformation by using script oqmd_data.py. Is a entry in CIF, poscar or pymatgen.Structure format?

The data format is almost the same as one of Materials Project REST API. So, the entry's structure is a pymatgen.Structure object.

Thanks. I found that oqmd_data.py can consume large memory. Only 21% entries consume 6.9G/8G memory. Will the process terminate when reaching maximum of memory of the device?

The process terminates when memory reaches maximum,why the loop of np.save_compressed consumes increased memory?

huzongxiang avatar May 12 '22 07:05 huzongxiang

The memory consumption is almost constant because the number of entries processed at each iteration does not excess chunk_size:

https://github.com/Tony-Y/cgnn/blob/c6c28e6eaf420665a01bd4e5b72b3ac20ff6b138/tools/oqmd_data.py#L105

Please change to chunk_size=250 to reduce the memory consumption.

Tony-Y avatar May 12 '22 07:05 Tony-Y

Could you check innodb_buffer_pool_size of MySQL?

mysql -u root -p
mysql> show variables like 'innodb_buffer_pool_size';

This variable limits the memory consumption of MySQL.

Tony-Y avatar May 12 '22 08:05 Tony-Y

The memory consumption is almost constant because the number of entries processed at each iteration does not excess chunk_size:

https://github.com/Tony-Y/cgnn/blob/c6c28e6eaf420665a01bd4e5b72b3ac20ff6b138/tools/oqmd_data.py#L105

Please change to chunk_size=250 to reduce the memory consumption.

thanks

huzongxiang avatar May 12 '22 09:05 huzongxiang

https://github.com/Tony-Y/cgnn/blob/c6c28e6eaf420665a01bd4e5b72b3ac20ff6b138/tools/oqmd_data.py#L116

Change 03d to 04d if you set chunk_size=250 because there are 561,888 entries.

Tony-Y avatar May 12 '22 10:05 Tony-Y

I‘m sorry, but I have tried several structures. I found that only the example SiO2's crystal graph was right. I don't know why .

IRilyDonKnow avatar Dec 02 '22 15:12 IRilyDonKnow

I want to clarify what you did. Did you compare the official graph_data.npz in OQMD v1.2 dataset with one that you generated according to the instruction?

Tony-Y avatar Dec 03 '22 08:12 Tony-Y

I want to clarify what you did. Did you compare the official graph_data.npz in OQMD v1.2 dataset with one that you generated according to the instruction?

well, the data I used are not the official OQMD v1.2 dataset, but they are from the Material Project which have been relaxed in VASP and been used to calculate the phonon spectrum.

IRilyDonKnow avatar Dec 05 '22 08:12 IRilyDonKnow

Your issue is very off-topic in this thread. Please create an issue by pushing the New issue green button above. In addition, give me enough information to reproduce your problems.

Tony-Y avatar Dec 05 '22 11:12 Tony-Y

Your issue is very off-topic in this thread. Please create an issue by pushing the New issue green button above. In addition, give me enough information to reproduce your problems.

OK,thanks. I think I have found some clues of this problem.

IRilyDonKnow avatar Dec 07 '22 10:12 IRilyDonKnow

Thanks. I found that oqmd_data.py can consume large memory. Only 21% entries consume 6.9G/8G memory. Will the process terminate when reaching maximum of memory of the device?

Using a qmpy environment, configured according to the latest instruction as of Dec 9, 2022, on an Ubuntu 18.04 docker container, I couldn't observe high memory consumption when executing oqmd_data.py.

I will publish my docker image for the qmpy environment to Docker Hub soon.

Tony-Y avatar Dec 09 '22 06:12 Tony-Y

I have just published my Docker image. Details are in CGNN v1.0.3 Release Notes.

Tony-Y avatar Dec 11 '22 07:12 Tony-Y