Refactor `_fit` Method of `GaussianCopulaSynthesizer` for Modularity
Description
To improve code reuse and maintainability in the SDV library, the _fit method of the GaussianCopulaSynthesizer class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.
We need to break down the _fit method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.
Expected Steps
-
Log Numerical Distributions: Keep the existing call to
log_numerical_distributions_erroras a standalone function. This step will remain unchanged. -
Learn Number of Rows: Move the logic for determining the number of rows (
self._num_rows = len(processed_data)) into a new method, e.g.,self._learn_num_rows. -
Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the
forloop that assigns distributions to each column should be refactored into a method, e.g.,self._get_numerical_distributions. -
Initialize the Model: Move the logic for initializing the model (
self._model = GaussianMultivariate(...)) to its own method, e.g.,self._initialize_model. -
Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g.,
self._fit_model.