SDV icon indicating copy to clipboard operation
SDV copied to clipboard

`DataProcessor` never gets assigned a `table_name`.

Open pvk-developer opened this issue 2 years ago • 0 comments

Error Description

The DataProcessor is designed to log the name of the table it is processing. This feature is particularly useful when dealing with MultiTableSynthesizers. However, currently, the table_name attribute is not being assigned, as the argument is not passed to the BaseSingleTableSynthesizer, to which the DataProcessor instance belongs.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.multi_table import HMASynthesizer
import logging

# Configure logging to see INFO level messages
logging.basicConfig(level=logging.INFO)

# Download demo data
data, metadata = download_demo('multi_table', 'fake_hotels')

# Initialize HMASynthesizer with metadata
hmas = HMASynthesizer(metadata)

# Fit the synthesizer to the data
hmas.fit(data)
image

Expected Behavior

The DataProcessor should correctly log the name of the table it is processing during synthesis, aiding in the debugging process and providing clarity on the synthesis workflow.

Multiple approaches

  1. Enhance BaseSingleTableSynthesizer Interface:

    • Add an additional table_name argument to the BaseSingleTableSynthesizer constructor.
    • Propagate this argument to the DataProcessor instance during synthesizer initialization.
    • Note: This approach requires adjustments to methods like get_parameters.
    • Example implementation can be found in this PR.
  2. Manual Attribute Setting in MultiTable Context:

    • Manually set the table_name attribute while utilizing the synthesizer within a MultiTable context.
    • Access the _data_processor attribute of the synthesizer instance and set table_name manually.
    • This approach offers a workaround without modifying the synthesizer's core interface.
    • Example implementation:
    for table_name in metadata.tables:
        synthesizer_instance = GaussianCopulaSynthesizer(metadata.tables[table_name])
        synthesizer_instance._data_processor.table_name = table_name
    

pvk-developer avatar Apr 26 '24 17:04 pvk-developer