matminer
matminer copied to clipboard
Featurizers missing prechecking
While #384 introduced 3 prechecks for featurizers, there are many more featurizers that could benefit from a precheck. This issue can be used for tracking these featurizers
- [ ]
CationProperty
,OxidationStates
,ElectronAffinity
,ElectronegativityDiff
,IonProperty
: all-metal entries' oxidation states are ill-defined, so they could be considered invalid - [ ] NN-based featurizers (e.g.,
SiteStatsFingerprint
): Throw warnings (not neccessarily fail precheck) when large structures are prechecked, cuz featurizing them is gonna take forever (prechecks should be very fast regardless) - [ ] Some others? Please add more featurizers to this issue list if you think of them
Note that returning a NaN for precheck could be considered as neither failing or passing a precheck, e.g., if the precheck test was skipped for a large structure
@computron Should prechecks have the ability to be skipped though? AFAIK they can be all be super fast operations that (1) pass (2) fail or (3) throw an error. If (3) happens the user should look into it. For example:
def precheck(self, structure): # for some NN featurizer
if len(structure.sites) > 250:
warnings.warn(some_warning)
...
# returns True or False, regardless of whether the warning shows or not
Or is there some other precheck mechanism you had in mind?
At the very least precheck skipping should be discouraged. Let's not introduce this unless we feel it is really necessary. Based on your comment I got the sense that some large structures could not be prechecked quickly. If that's the case it might be better to throw NaN versus have the precheck take forever.
Yes - I think all featurizers can have a precheck that never skips. I don't see any operations where just checking if the featurizer will take a long time will itself take a long time.
Edit: I see in my comment where confusion could come from, changed it