pySIPFENN icon indicating copy to clipboard operation
pySIPFENN copied to clipboard

Notes from first time usage and APIs

Open bocklund opened this issue 9 months ago • 3 comments

calculate_KS2022_randomSolutions always prints all input in parallel mode

https://github.com/PhasesResearchLab/pySIPFENN/blob/b3bec0478d9faf0476c58072b617ee5710621dc9/pysipfenn/core/pysipfenn.py#L612

It may be kind of noisy to always print the input without being behind a verbose flag if the compList is very long (especially since it's a lot of pymatgen Structure objects). Feel free to close

bocklund avatar May 08 '24 00:05 bocklund

It's a leftover from some debugging. Already fixing it!

amkrajewski avatar May 08 '24 00:05 amkrajewski

I have some more general notes, and might just leave them here. Feel free to convert any to issues or I can elaborate if needed.

  1. It's surprising that SIPFENN mutates input. For example, calling Calculator.calculate_KS2022_randomSolutions with compList=composition_strs where composition_strs is some list of strings that I created, calculate_KS2022_randomSolutions will mutate the composition_strs list that I passed in and change it's type from List[str] to List[Composition].
  2. Calculator.loadModelCustom seems to require a separately specified model directory and network name here. Maybe it would make more sense to take in a single argument for the path to a network file? If you want to support local directory with paths, I think that should be possible too. To be extra, you could take in a list of paths to search for NNs by name (similar to the PATH variable used by operating systems)
  3. related, it might be a nice convenience if the modelNamein loadModelCustom was simply the networkName by default. At least for some of the methods I've been using, like get_resultDicts() it doesn't seem like the name I provided is used anyway.
  4. It seems inconsistent that Calculator.loadModelCustom hardcodes a onnx extension to the file, but other places like writeDescriptorsToCSV documents that "The file must have a '.csv' extension to be recognized correctly", but this actually isn't a requirement of the code (and isn't enforced anyway) and seems to be more of a suggestion if users want to use the CSV with something that depends on file extensions to recognize the type. I would err on the side of having users include full filenames with extensions, and only using extensions to the extent that they are hints about the type of file being read.
  5. (more of a bug, but I'll leave it here) Calculator.get_resultDictsWithNames fails without a nice warning or error if the inputNames member isn't set on Calculator instances

bocklund avatar May 08 '24 00:05 bocklund

@bocklund Thank you for taking your time and providing great feedback! As always 😃

  • 1 - The idea was that the user's input would be expanded internally, and the exact thing it was run on would be made available that way. The same thing happens for baseStructList. I can see, of course, how this could cause issues. I plan to trim down this functionality into string representation of each structure-composition pair and move that into the inputFiles to address 5 with one stone.
  • 2 - In the older (pre-2021) versions, that's exactly how it worked by scanning through a list of paths and harvesting applicable models. I liked that, but I found out most users generally preferred very different names from me (SIPFENN_Krajewski2020 Novel Materials Model vs SIPFENN_Krajewski2020_NN20_B2048_E520_FP16_SimpNorm) hence, automatic handling of the default models from a fixed location and manual loading of models for advanced users. It could certainly be expanded if time allows.
  • 3 the get_resultDicts was intended to print the user-friendly modelName. Thanks for catching that!
  • 4 - Thanks for pointing that out! loadModelCustom will be adjusted so that the path passes directly if it has onnx extension, error gently if it has another extension, and append it only on extensionless paths.

amkrajewski avatar May 08 '24 15:05 amkrajewski