Fuse the functionality used in both `_merge_histogram` and the newly created `_assimilate_histogram`
Is your feature request related to a problem? Please describe. In an effort to adhere to the goal of achieving a clear paradigm of one, easy to understand, path for each of the following tasks for profiling: Updating, Getting, and Merging This issue focuses on clearing up the path to defining how to merge a profile (or parts of a profile) with a singular function path to achieving this goal.
The problem this issue addresses is the use of both _merge_histogram and the newly created _assimilate_histogram as well as other merging processes within the dataprofiler that repeat functionality/have overlapping goals for input and output.
An example of a fix for achieving this paradigm is as follows:
We have implemented a much better way to put information from two histograms together with the creation of _assimilate_histogram and we should be able to use that function throughout the code while also achieving the previously desired functionality of _merge_histograms. We can see the old way of doing this in numerical_column_stats.py on line 1286. This recreates the histogram data which is more memory intensive than doing it the way we do in _assimilate_histogram.
Describe the outcome you'd like: I would like a singular path to merging profiles and their information that achieves the success of all currently existing functions usage.
Additional context:
For detail behind _assimilate_histogram the PR:
https://github.com/capitalone/DataProfiler/pull/815
Implements the more memory optimized solution
Summary of the new paradigm for histograms.
merge: (Built histogram + built histogram) update: (new data -> get hist on new data-> built histogram) + (existing built histogram)
All calculations should have a get, update, and merge.
Where get -> calcs from raw data.
merge -> takes two existing calcs and merges them
update -> takes in new data to add; get + merge