ShapeWorks icon indicating copy to clipboard operation
ShapeWorks copied to clipboard

Update PCA_Embedder Saving and Loading

Open acreegan opened this issue 1 year ago • 2 comments

The main goal of this update was to be able to save a PCA model to disk, then load it again later without the need to re-run the PCA analysis from the original data. I wanted to do this within a python program importing ShapeWorks as a library. Of the two sets of PCA functionality in the ShapeWorks repository, the PCA_Embedder class from the pure python DataAugmentationUtils module was closest to having these features, and easiest to extend in python, so this update extends that class.

Changes made:

  • Allow __init__ method to accept None as a data_matrix parameter, in which case the PCA analysis is not run.
  • Add load_PCA to allow loading raw PCA attributes from arrays
  • Add from_directory class method, a factory function to create a PCA_Embedder instance from a model saved to disk
  • Tidy the write_PCA function to make it more consistent, give the option for not saving subject scores (in case these are priveledged information) and ensuring it is consistent with from_directory
  • Change the project function to use pre-calculated mean data instead of raw data, so that only the mean data needs to be saved
  • Change percent variability calculation to use greater or equal to allow percent_variability of 1
  • Remove the self.num_dim attribute, instead inferring this from the length of the PCA scores array passed to the project function.
  • Add documentation in code clarifying method parameters
  • Add tests to ensure:
    • PCA functionality is the same as ParticleShapeStatistics and sklearn
    • Loading and saving works as intended
    • Percent variability works as intended
    • (These tests are now passing in github actions)

acreegan avatar Feb 16 '24 01:02 acreegan

I'm getting this error with the deep_ssm use case (automated test)

2024-03-29T00:18:36.3395229Z 8:   File "/__w/ShapeWorks/ShapeWorks/Examples/Python/RunUseCase.py", line 97, in <module>
2024-03-29T00:18:36.3396061Z 8:     module.Run_Pipeline(args)
2024-03-29T00:18:36.3397041Z 8:   File "/__w/ShapeWorks/ShapeWorks/Examples/Python/deep_ssm.py", line 257, in Run_Pipeline
2024-03-29T00:18:36.3398394Z 8:     embedded_dim = DeepSSMUtils.run_data_augmentation(project, num_samples, num_dim, percent_variability, sampler,
2024-03-29T00:18:36.3399786Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DeepSSMUtilsPackage/DeepSSMUtils/run_utils.py", line 289, in run_data_augmentation
2024-03-29T00:18:36.3401098Z 8:     embedded_dim = DataAugmentationUtils.runDataAugmentation(aug_dir, train_image_filenames,
2024-03-29T00:18:36.3402499Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/__init__.py", line 22, in runDataAugmentation
2024-03-29T00:18:36.3404177Z 8:     num_dim = DataAugmentation.point_based_aug(out_dir, img_list, world_point_list, num_samples, num_dim, percent_variability, sampler_type, mixture_num, processes)
2024-03-29T00:18:36.3405906Z 8:   File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/DataAugmentation.py", line 37, in point_based_aug
2024-03-29T00:18:36.3407071Z 8:     num_dim = PointEmbedder.num_dim
2024-03-29T00:18:36.3407902Z 8: AttributeError: 'PCA_Embbeder' object has no attribute 'num_dim'

akenmorris avatar Mar 29 '24 00:03 akenmorris

@acreegan , I've fixed those errors, but the new pca embedder test fails on Mac and Windows. I assume due to a precision/rounding difference. I'll take a look at it again when I have a chance.

akenmorris avatar Mar 29 '24 21:03 akenmorris