pymatgen
pymatgen copied to clipboard
Add f* diagram generator
Summary
Added f* diagram generator.
Additional dependencies introduced (if any)
- List all new dependencies needed and justify why. While adding dependencies that bring significantly useful functionality is perfectly fine, adding ones that add trivial functionality, e.g., to use one single easily implementable function, is frowned upon. Provide a justification why that dependency is needed. Especially frowned upon are circular dependencies, e.g., depending on derivative modules of pymatgen such as custodian or Fireworks.
TODO (if any)
If this is a work-in-progress, write something about what else needs to be done
Checklist
Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request title.
Before a pull request can be merged, the following items must be checked:
- [x] Code is in the standard Python style. Run pycodestyle and flake8 on your local machine.
- [x] Docstrings have been added in the Google docstring format. Run pydocstyle on your code.
- [x] Type annotations are highly encouraged. Run mypy to type check your code.
- [x] Tests have been added for any new functionality or bug fixes.
- [x] All existing tests pass.
Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is
highly recommended that you use the pre-commit hook provided in the pymatgen
repository. Simply cp pre-commit .git/hooks and a check will be run prior to
allowing commits.
Hi @jonathanjdenney, thanks for this.
A few comments before merging, mostly just to follow Python conventions:
- The file
pymatgen/analysis/Fstar/__pycache__/fstar.cpython-36.pycshould be removed - The folder
pymatgen/analysis/Fstar/should be renamed topymatgen/analysis/fstar/ - The files
neutron_factors.csvxray_factors_2.csvare duplicated in thetestsfolder, this is unnecessary. FStarDiagram_test->test_fstardiagram(method names should be lower case)x_ray_scatter_df,neutron_scatter_dfshould beX_RAY_SCATTER_DF,NEUTRON_SCATTER_DF(global variables upper case)- instead of
tryexcept AttributeErroryou can do anifstatement, e.g.if isinstance(sp, Speciesorif hasattr(sp, "element")etc.
I see changes were made via direct file upload, I highly recommend GitHub Desktop for making easier edits.
These are all my suggestions for now, but I'll take another look after the above changes.
Also, I would add a reference to the papers using f* (e.g. https://doi.org/10.1063/1.5044555 https://doi.org/10.1021/acs.chemmater.9b03646) to the docstring to point newcomers in the right direction.
Hi @mkhorton , I will get to work on your suggestions. Thanks for looking at it.
Hi @mkhorton , I believe I have addressed all of your comments. Please let me know if I missed anything.
Thanks @jonathanjdenney ! I'll take another look. Were some tests removed in the most recent iteration?
@mkhorton No tests were actually removed. I had previously accidentally added a file that was a non-functioning version of the test file which I have now removed.
@mkhorton Hello Matt, Sorry for not getting to your comments sooner, I've had other deadlines. The purpose of including a way to input cifs directly into the script was to make it more user friendly to our target audience, experimentalists. Experimentalists will typically be more familiar with cifs over structure objects. In addition, this method allows the user to easily input non-physical values for the occupancies which may have resulted from the refinement. As far as I know, the cif parser will round greater than 1 occupancies to 1 regardless of what tolerance you set. This script can take structure objects as its input without any trouble, so is it really unacceptable that I've given the option to use cifs? In regards to your last comment I agree it is a bit confusing. I will add to the code description an explanation that you use the labels generated in the site_labels list, which can be viewed by printing the list. It is input as a list of strings. Best, Jonathan
In addition, this method allows the user to easily input non-physical values for the occupancies which may have resulted from the refinement. As far as I know, the cif parser will round greater than 1 occupancies to 1 regardless of what tolerance you set.
If there's a case where unphysical occupancies are useful, then we should change CifParser accordingly.
The purpose of including a way to input cifs directly into the script was to make it more user friendly to our target audience, experimentalists. This script can take structure objects as its input without any trouble, so is it really unacceptable that I've given the option to use cifs?
Maybe to explain the issue a different way, pymatgen is intended as a library. For someone using it, regardless of their background or expertise (I started off as an experimentalist myself), you have to learn how to use Python, how to install the package, how to import the relevant functionality you want etc.
Therefore, with the recommendation to people using the script will be to say, "first you write one line to import your CIFs" and then "you run the FStarDiagram generator", in my own view the extra directive to write the additional line is not the limiting hurdle for using a library like pymatgen.
The objection to having the CIF input is that it creates confusion and makes it more difficult to reason about what the code is doing internally. The existence of the Structure object is not to create yet another file format (e.g. on Materials Project, we offer CIF downloads, not Structure downloads), it's to create a standardized interface to crystal structures internally within pymatgen.
More importantly however, this discussion has highlighted a limitation (the CIF unphysical occupancies mentioned), and we'd be papering over this limitation rather than addressing it properly. By adding the capability to CifParser, other people can also have the benefit of importing CIF with unphysical occupancies, and the resulting code within FStarDiagram will be a lot simpler too.