matminer icon indicating copy to clipboard operation
matminer copied to clipboard

Upgrades Living Document - Datasets

Open ardunn opened this issue 4 years ago • 6 comments
trafficstars

Cubic crystal compounds

  • paper https://journals.aps.org/prmaterials/abstract/10.1103/PhysRevMaterials.5.063802
  • link: https://materialsdata.nist.gov/handle/11256/994

~~Superconductor temperatures~~

  • ~~585 superconductor temperatures ID: 2210 mentioned here~~ Superceded by #732

2D Ferromagnets

  • 786 materials, 26 with curie point beyond 400k for 2D ferromagnets from here

UV/Vis spectra of metal oxides

  • Composition/UV-Vis measurements from 179072 metal oxides from here

~~UCSB Thermoelectrics database~~

  • done in #639

CMRDB 2D Materials Databse

  • https://cmrdb.fysik.dtu.dk/c2db/

Vicker's load-dependent hardness dataset

  • See #765

~~TAATA polymorphs dataset~~ addressed in #794

  • https://zenodo.org/record/5530535#.YjJ3ZhDMJLQ

@CompRhys WBM set

  • https://archive.materialscloud.org/record/2021.68

ardunn avatar Jun 08 '21 00:06 ardunn

See also https://github.com/materialsproject/matbench/issues/2

ardunn avatar Jan 13 '22 05:01 ardunn

https://zenodo.org/record/5530535 I was recently able to get the permissions to upload this publicly for some of our work. Potentially of interest - the data is 3 highly sampled phase diagrams including a lot of unstable structures. I uploaded both the initial (from prototyping) and relaxed structures. The authors refer to it as the TAATA data set.

CompRhys avatar Mar 15 '22 04:03 CompRhys

@CompRhys this is great! Can we upload this to our figshare as well, in a different form (one big df with all systems in it?)

If not, will that zenodo link always be available? If so, we still add it as a dataset

ardunn avatar Mar 16 '22 23:03 ardunn

Zenodo is a different permanent archive service that will always exist as long as CERN exists so you can use the zenodo links safely. However I got permission to share with MIT so whatever makes sense is allowed. If you want to combine them into a single data set just note that there are some duplicated materials found in all 3 phase diagrams that should be removed - they have the same ht_id.

CompRhys avatar Mar 16 '22 23:03 CompRhys

BTW @CompRhys this is an excellent and somewhat unique dataset. Shame the band gaps and tensors for all the compounds weren't included but still. I think this has the potential to be a difficult and interesting problem in matbench eventually, so I'll be adding it to the living document on that repo as well

ardunn avatar Mar 18 '22 03:03 ardunn

Speaking of bandgaps, I also have another data set that I think would be good that I have been calling WBM. The relaxed structures are available already (https://archive.materialscloud.org/record/2021.68) but there are a few issues with matching the dataset to the bandgaps in the summary file that took a while to get right. When @janosh attended the MPworkshop we asked about having it added to MP as it is MP compatible but we never chased it. Lmk if interested and I will share the cleaned version with initial structures.

There's also some more data shared by the groups of B and M from WBM that I also think might be worth making a true benchmark with but haven't found time to dig into it as a super nice thing about WBM is the the sequential nature of batches which might not be easy to maintain if we add in the extra data. This was the idea I was hinting to here https://github.com/materialsproject/matbench/issues/104#issuecomment-1030755395.

CompRhys avatar Mar 18 '22 03:03 CompRhys