matminer
matminer copied to clipboard
Upgrades Living Document - Datasets
Cubic crystal compounds
- paper https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.5.063802
- link: https://materialsdata.nist.gov/handle/11256/994
~~Superconductor temperatures~~
2D Ferromagnets
- 786 materials, 26 with curie point beyond 400k for 2D ferromagnets from here
UV/Vis spectra of metal oxides
- Composition/UV-Vis measurements from 179072 metal oxides from here
~~UCSB Thermoelectrics database~~
- done in #639
CMRDB 2D Materials Databse
- https://cmrdb.fysik.dtu.dk/c2db/
Vicker's load-dependent hardness dataset
- See #765
~~TAATA polymorphs dataset~~ addressed in #794
- https://zenodo.org/record/5530535#.YjJ3ZhDMJLQ
@CompRhys WBM set
- https://archive.materialscloud.org/record/2021.68
See also https://github.com/materialsproject/matbench/issues/2
https://zenodo.org/record/5530535 I was recently able to get the permissions to upload this publicly for some of our work. Potentially of interest - the data is 3 highly sampled phase diagrams including a lot of unstable structures. I uploaded both the initial (from prototyping) and relaxed structures. The authors refer to it as the TAATA data set.
@CompRhys this is great! Can we upload this to our figshare as well, in a different form (one big df with all systems in it?)
If not, will that zenodo link always be available? If so, we still add it as a dataset
Zenodo is a different permanent archive service that will always exist as long as CERN exists so you can use the zenodo links safely. However I got permission to share with MIT so whatever makes sense is allowed. If you want to combine them into a single data set just note that there are some duplicated materials found in all 3 phase diagrams that should be removed - they have the same ht_id.
BTW @CompRhys this is an excellent and somewhat unique dataset. Shame the band gaps and tensors for all the compounds weren't included but still. I think this has the potential to be a difficult and interesting problem in matbench eventually, so I'll be adding it to the living document on that repo as well
Speaking of bandgaps, I also have another data set that I think would be good that I have been calling WBM. The relaxed structures are available already (https://archive.materialscloud.org/record/2021.68) but there are a few issues with matching the dataset to the bandgaps in the summary file that took a while to get right. When @janosh attended the MPworkshop we asked about having it added to MP as it is MP compatible but we never chased it. Lmk if interested and I will share the cleaned version with initial structures.
There's also some more data shared by the groups of B and M from WBM that I also think might be worth making a true benchmark with but haven't found time to dig into it as a super nice thing about WBM is the the sequential nature of batches which might not be easy to maintain if we add in the extra data. This was the idea I was hinting to here https://github.com/materialsproject/matbench/issues/104#issuecomment-1030755395.