pycytominer icon indicating copy to clipboard operation
pycytominer copied to clipboard

FeatureRequest: Create ProfileData Class

Open kenibrewer opened this issue 10 months ago • 0 comments

Feature type

  • [X] Add new functionality

Story

As a pycytominer developer, I would like to have descriptive dataclasses to use when working on pycytominer's main functions. Instead of repetitively passing necessary attributes such as "feature_cols" and "metadata_cols" from function to function, I could pass a singluar descriptive dataclass with all the information about the dafarame needed to operate on it. This would allow me to write more modular, more easily tested code.

General description of the proposed functionality

As a first step to reducing the quantity of redundant code in Pycytominer, it would be good to create a ProfileData Class. This Class could contain methods that provide shared functionality used by all or most of the core pycytominer functions such how the data should be read from a file and determining what feature/metadata columns are.

Example pseudo-code

Class ProfilesData:
    ___init__(profiles, feature_cols, meta_cols,):
        self.profiles_df = pd.read_csv(input_csv)
        self.features_df = self.profiles_df[feature_cols]

    def aggregate_data(self, aggregate_on):

        self.profiles_df.group_by(aggregate_on) 


Additional information

This class should be initially provided as separate functionality, but gradually could integrated into the core functions (aggregate, normalize, annotate, etc. Ideally, this should be able to be accomplished without changing the functionality that users expect from those functions.

kenibrewer avatar Oct 07 '23 19:10 kenibrewer