SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Refactor `_fit` Method of `GaussianCopulaSynthesizer` for Modularity

Open pvk-developer opened this issue 1 year ago • 0 comments

Description

To improve code reuse and maintainability in the SDV library, the _fit method of the GaussianCopulaSynthesizer class should be modularized by splitting it into multiple, well-defined functions. This will make the code easier to extend.

We need to break down the _fit method into smaller steps. These steps should be implemented as separate functions to handle specific tasks within the fitting process.

Expected Steps

  • Log Numerical Distributions: Keep the existing call to log_numerical_distributions_error as a standalone function. This step will remain unchanged.

  • Learn Number of Rows: Move the logic for determining the number of rows (self._num_rows = len(processed_data)) into a new method, e.g., self._learn_num_rows.

  • Extract Numerical Distributions for Modeling: Create a new method to extract numerical distributions for modeling. The logic inside the for loop that assigns distributions to each column should be refactored into a method, e.g., self._get_numerical_distributions.

  • Initialize the Model: Move the logic for initializing the model (self._model = GaussianMultivariate(...)) to its own method, e.g., self._initialize_model.

  • Fit the Model: Finally, create a new method to encapsulate the logic for fitting the model with scipy warnings handling, e.g., self._fit_model.

pvk-developer avatar Oct 22 '24 19:10 pvk-developer