ShapeWorks
ShapeWorks copied to clipboard
Update PCA_Embedder Saving and Loading
The main goal of this update was to be able to save a PCA model to disk, then load it again later without the need to re-run the PCA analysis from the original data. I wanted to do this within a python program importing ShapeWorks as a library. Of the two sets of PCA functionality in the ShapeWorks repository, the PCA_Embedder class from the pure python DataAugmentationUtils module was closest to having these features, and easiest to extend in python, so this update extends that class.
Changes made:
- Allow __init__ method to accept None as a data_matrix parameter, in which case the PCA analysis is not run.
- Add load_PCA to allow loading raw PCA attributes from arrays
- Add from_directory class method, a factory function to create a PCA_Embedder instance from a model saved to disk
- Tidy the write_PCA function to make it more consistent, give the option for not saving subject scores (in case these are priveledged information) and ensuring it is consistent with from_directory
- Change the project function to use pre-calculated mean data instead of raw data, so that only the mean data needs to be saved
- Change percent variability calculation to use greater or equal to allow percent_variability of 1
- Remove the self.num_dim attribute, instead inferring this from the length of the PCA scores array passed to the project function.
- Add documentation in code clarifying method parameters
- Add tests to ensure:
- PCA functionality is the same as ParticleShapeStatistics and sklearn
- Loading and saving works as intended
- Percent variability works as intended
- (These tests are now passing in github actions)
I'm getting this error with the deep_ssm use case (automated test)
2024-03-29T00:18:36.3395229Z 8: File "/__w/ShapeWorks/ShapeWorks/Examples/Python/RunUseCase.py", line 97, in <module>
2024-03-29T00:18:36.3396061Z 8: module.Run_Pipeline(args)
2024-03-29T00:18:36.3397041Z 8: File "/__w/ShapeWorks/ShapeWorks/Examples/Python/deep_ssm.py", line 257, in Run_Pipeline
2024-03-29T00:18:36.3398394Z 8: embedded_dim = DeepSSMUtils.run_data_augmentation(project, num_samples, num_dim, percent_variability, sampler,
2024-03-29T00:18:36.3399786Z 8: File "/__w/ShapeWorks/ShapeWorks/Python/DeepSSMUtilsPackage/DeepSSMUtils/run_utils.py", line 289, in run_data_augmentation
2024-03-29T00:18:36.3401098Z 8: embedded_dim = DataAugmentationUtils.runDataAugmentation(aug_dir, train_image_filenames,
2024-03-29T00:18:36.3402499Z 8: File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/__init__.py", line 22, in runDataAugmentation
2024-03-29T00:18:36.3404177Z 8: num_dim = DataAugmentation.point_based_aug(out_dir, img_list, world_point_list, num_samples, num_dim, percent_variability, sampler_type, mixture_num, processes)
2024-03-29T00:18:36.3405906Z 8: File "/__w/ShapeWorks/ShapeWorks/Python/DataAugmentationUtilsPackage/DataAugmentationUtils/DataAugmentation.py", line 37, in point_based_aug
2024-03-29T00:18:36.3407071Z 8: num_dim = PointEmbedder.num_dim
2024-03-29T00:18:36.3407902Z 8: AttributeError: 'PCA_Embbeder' object has no attribute 'num_dim'
@acreegan , I've fixed those errors, but the new pca embedder test fails on Mac and Windows. I assume due to a precision/rounding difference. I'll take a look at it again when I have a chance.