DataProfiler
DataProfiler copied to clipboard
Refactor UnstructuredDataLabelerProfile to properly use the option class and property calcs
Currently this doesn't follow the same calculation logic as structured profiles. Refactor it so it can easily integrate with options utilizing the enable / disable feature for calculations.
filtering:
@staticmethod
def _filter_properties_w_options(calculations, options):
"""
Cycles through the calculations and turns off the ones that are
disabled.
:param calculations: Contains all the column calculations.
:type calculations: Dict
:param options: Contains all the options.
:type options: BaseColumnOptions
"""
for prop in list(calculations):
if options and not options.is_prop_enabled(prop):
del calculations[prop]
calculations:
def _perform_property_calcs(self, calculations, df_series,
prev_dependent_properties, subset_properties):
"""
Cycles through the properties of the columns and calculate them.
:param calculations: Contains all the column calculations.
:type calculations: dict
:param df_series: Data to be profiled
:type df_series: pandas.Dataframe
:param prev_dependent_properties: Contains all the previous properties
that the calculations depend on.
:type prev_dependent_properties: dict
:param subset_properties: Contains the results of the properties of the
subset before they are merged into the main data profile.
:type subset_properties: dict
:return: None
"""
for prop in calculations:
calculations[prop](self,
df_series,
prev_dependent_properties,
subset_properties)
Also, add the merge feature to this profile.
merging:
@staticmethod
def _merge_calculations(merged_profile_calcs, profile1_calcs, profile2_calcs):
"""
Merges the calculations of two profiles to the lowest common
denominator.
:param merged_profile_calcs: default calculations of the merged profile
:type merged_profile_calcs: dict
:param profile1_calcs: calculations of profile1
:type profile1_calcs: dict
:param profile2_calcs: calculations of profile2
:type profile2_calcs: dict
:return: None
"""
calcs = list(merged_profile_calcs.keys())
for calc in calcs:
if calc not in profile1_calcs or calc not in profile2_calcs:
del merged_profile_calcs[calc]
if calc in profile1_calcs or calc in profile2_calcs:
warnings.warn("{} is disabled because it is not enabled in "
"both profiles.".format(calc), RuntimeWarning)
@JGSweets is this still necessary?