MaterialsInformatics icon indicating copy to clipboard operation
MaterialsInformatics copied to clipboard

HW 1 general suggestions - `webplotdigitizer` and `MPRester` tips

Open sgbaird opened this issue 3 years ago • 0 comments

Problem 1

  • when you have variables in the chemical formula, pay extra attention to which formulas correspond to which values of x. For example, in some plots, x=0 --> x=1.0 might go from the top to the bottom, whereas in others x=0 starts from the bottom.

  • pay attention to units, e.g. Kelvin vs. Celsius, 10^4 S/m vs. S/m vs. S/cm, make sure units are converted correctly based on what's listed on the spreadsheet e.g. electrical conductivity: S*cm^-1.

  • I found it useful to add all images (where the image includes figure caption) to a single session in webplotdigitizer, and use "Point Groups" corresponding to each of the chemical formulas if grabbing multiple traces from an image. Additionally, if multiple types of data were in the same figure, I made copies of each figure and named them e.g. fig5-electrical-conductivity.png, fig5-thermal-conductivity.png ... even though they're the exact same figure. This makes it easier to retain the caption and separate calibrations for each dataset. Rename your dataset appropriately, e.g. electrical-conductivity to make it easier to keep track and so that when you export the CSV it auto-populates the name.

  • a trick to using "Point Groups" (not the crystallographic kind) is to add a group for each composition (e.g. Cu0.98GaTe2, Cu0.985GaTe2, Cu0.99GaTe2, CuGaTe2) and then select the points in order for a given temperature. For example, click points in the following order:

    1. Cu0.98GaTe2@300K 2. Cu0.985GaTe2@300K 3. Cu0.99GaTe2@300K 4. CuGaTe2@300K 5. Cu0.98GaTe2@400K 6. Cu0.985GaTe2@400K ... ... 16. CuGaTe2@800K

Then click on your dataset, View Data, and Sort By --> Groups (dropdown). You can also export to CSV from this interface.

  • I suggest saving your images, raw CSV data, and your webplotdigitizer project (JSON and TAR format) data organized into folders based on the article, or at least save a copy of your data somewhere other than Google Sheets (e.g. your local computer) for data redundancy.

Problem 2

One of the best resources for getting an intro to MPRester is via the Materials Project workshop tutorial.

On YouTube, there is Taylor's prerecorded lecture and (what I'm pretty sure is) the corresponding video for the workshop tutorial mentioned above.

In addition to the customized examples given by Taylor in this repository, here are some additional examples "in practice" at RoboCrab (archived repo) and mat_discover.

Task 5

  • See https://matsci.org/t/how-to-distinguish-experimental-or-theoretical-structure-entries/2036/4

  • To keep track of the MPIDs that map back to a single composition (i.e. repeat chemical formulas), I suggest using df.set_index() to replace the normal indices (0, 1, 2, ...) with your mpids. Then follow the advice at https://stackoverflow.com/a/49216427/13697228. See also groupby_formula for an example that is close to what is asked.

  • Bonus (but not actually extra credit): assuming you added a "count" column as in the example above, you can see the number of repeats for a given chemical formula via:

grp_df.hist("count", bins=100, log=True)

In this case, the large majority of compounds have fewer than 20 polytopes, but there is one chemical formula with 200 repeats!?

sgbaird avatar Jan 23 '22 04:01 sgbaird