Error when run custom model using benchmark_single_table
Environment Details
- SDGym version: 0.8.0
- Python version: 3.11.5
- Operating System: Windows 11
Error Description
When running the same code as #321 , the following error was encountered.
Steps to reproduce
import os
import shutil
import sdgym
from sdgym import create_single_table_synthesizer
from sdgym.synthesizers import (UniformSynthesizer,
GaussianCopulaSynthesizer,
TVAESynthesizer)
import warnings
warnings.filterwarnings('ignore')
synthesizers = [
UniformSynthesizer,
GaussianCopulaSynthesizer,
TVAESynthesizer
]
# YData
# CTGAN
def ctgan_get_trained_synthesizer(data, metadata):
from ydata_synthetic.synthesizers.regular import RegularSynthesizer
from ydata_synthetic.synthesizers import ModelParameters, TrainParameters
ctgan_args = ModelParameters(batch_size=500, lr=2e-4, betas=(0.5, 0.9))
train_args = TrainParameters(epochs=2)
synthesizer = RegularSynthesizer(modelname='ctgan', model_parameters=ctgan_args)
num_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] in ['numerical', 'datetime']]
cat_cols = [col for col, sdtype in metadata['columns'].items() if sdtype['sdtype'] == 'categorical']
synthesizer.fit(data=data,
train_arguments=train_args,
num_cols=num_cols,
cat_cols=cat_cols)
return synthesizer
def sample_from_synthesizer(synthesizer, n_rows):
synthetic_data = synthesizer.sample(n_rows)
return synthetic_data
YData_CTGANSynthesizer = create_single_table_synthesizer(
get_trained_synthesizer_fn=ctgan_get_trained_synthesizer,
sample_from_synthesizer_fn=sample_from_synthesizer,
display_name='YData-CTGAN'
)
custom_synthesizers = [YData_CTGANSynthesizer]
# Detect the existence of the folder
detailed_results_folder = r"C:\Users\18840\Desktop\result"
if os.path.isdir(detailed_results_folder) and\
os.path.exists(detailed_results_folder):
print('The folder where the intermediate files are stored already exists and is processed for deletion.')
shutil.rmtree(detailed_results_folder, ignore_errors=True)
print('-' * 50)
results = sdgym.benchmark_single_table(
synthesizers=synthesizers,
custom_synthesizers=custom_synthesizers,
show_progress=True,
multi_processing_config={
'package_name': 'multiprocessing',
'num_workers': 8
},
sdv_datasets=['adult'],
detailed_results_folder=detailed_results_folder
)
Hi there @T0217 👋 Do you mind updating SDGym and related libraries in our ecosystem to see if you're still running into this issue? We released some changes, so I'm always curious to validate if it's still relevant!
Second -- this is a bit challenging for us to debug because we aren't authors of Custom:YData-CTGAN etc. I'm curious if you were able to figure out the source of your error since posting this issue?
Thanks for the feedback. I've updated SDGym to test it out. The TypeError issue with the Ydata CTGAN model, caused by weak references, persists. This is likely due to certain attributes or components within the model that use weak references. Switching from pickle to dill for serialization, as suggested in #328, or using the model from the SDV library, can resolve this problem. However, the issue mentioned in #321 remains unresolved, regardless of whether the model from SDV or Ydata is used.