DataProfiler icon indicating copy to clipboard operation
DataProfiler copied to clipboard

Refactor UnstructuredDataLabelerProfile to properly use the option class and property calcs

Open JGSweets opened this issue 3 years ago • 1 comments

Currently this doesn't follow the same calculation logic as structured profiles. Refactor it so it can easily integrate with options utilizing the enable / disable feature for calculations.

filtering:

    @staticmethod
    def _filter_properties_w_options(calculations, options):
        """
        Cycles through the calculations and turns off the ones that are
        disabled.

        :param calculations: Contains all the column calculations.
        :type calculations: Dict
        :param options: Contains all the options.
        :type options: BaseColumnOptions
        """
        for prop in list(calculations):
            if options and not options.is_prop_enabled(prop):
                del calculations[prop]

calculations:

    def _perform_property_calcs(self, calculations, df_series,
                                prev_dependent_properties, subset_properties):
        """
        Cycles through the properties of the columns and calculate them.

        :param calculations: Contains all the column calculations.
        :type calculations: dict
        :param df_series: Data to be profiled
        :type df_series: pandas.Dataframe
        :param prev_dependent_properties: Contains all the previous properties 
        that the calculations depend on.
        :type prev_dependent_properties: dict
        :param subset_properties: Contains the results of the properties of the
        subset before they are merged into the main data profile.
        :type subset_properties: dict
        :return: None
        """
        for prop in calculations:
            calculations[prop](self,
                               df_series,
                               prev_dependent_properties,
                               subset_properties)

Also, add the merge feature to this profile.

merging:

    @staticmethod
    def _merge_calculations(merged_profile_calcs, profile1_calcs, profile2_calcs):
        """
        Merges the calculations of two profiles to the lowest common
        denominator.

        :param merged_profile_calcs: default calculations of the merged profile
        :type merged_profile_calcs: dict
        :param profile1_calcs: calculations of profile1
        :type profile1_calcs: dict
        :param profile2_calcs: calculations of profile2
        :type profile2_calcs: dict
        :return: None
        """
        calcs = list(merged_profile_calcs.keys())
        for calc in calcs:
            if calc not in profile1_calcs or calc not in profile2_calcs:
                del merged_profile_calcs[calc]
                if calc in profile1_calcs or calc in profile2_calcs:
                    warnings.warn("{} is disabled because it is not enabled in "
                                  "both profiles.".format(calc), RuntimeWarning)

JGSweets avatar Apr 23 '21 14:04 JGSweets

@JGSweets is this still necessary?

lettergram avatar Aug 31 '21 21:08 lettergram