CTGAN icon indicating copy to clipboard operation
CTGAN copied to clipboard

TypeError while ctgan.fit()

Open AT9991 opened this issue 1 year ago • 1 comments

Environment Details

Google Colab

Error Description

TypeError Traceback (most recent call last) in <cell line: 1>() ----> 1 ctgan.fit(trial)

6 frames /usr/local/lib/python3.10/dist-packages/rdt/transformers/base.py in _set_seed(self, data) 365 hash_value = self.columns[0] 366 for value in data.head(5): --> 367 hash_value += str(value) 368 369 hash_value = int(hashlib.sha256(hash_value.encode('utf-8')).hexdigest(), 16)

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

Steps to reproduce

!pip install ctgan from ctgan import CTGAN data = pd.read_csv(...) ctgan = CTGAN(epochs=100) ctgan.fit(data)

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

AT9991 avatar Jan 05 '24 06:01 AT9991

I was facing the same problem. There may be a problem with your column names, they should be strings.

aarishmaqsood avatar Jan 24 '24 14:01 aarishmaqsood

Hi @AT9991 and @aarishmaqsood, would either of you be able to share some CSV data that we can use to replicate this?

BTW instead of using the CTGAN library directly, I would highly recommend using the SDV library. You can access the CTGAN Synthesizer via SDV. Doing so will allow you to make use of additional features -- such as better data pre-processing, customizations such as constraints, and conditional sampling.

I actually wonder whether you would still encounter this bug in SDV, since there is a lot more data validation and checking we do there. Here is a tutorial that uses CTGAN via the SDV library.

npatki avatar Apr 16 '24 23:04 npatki

@npatki Thank you for your response. I have fixed my problem. In the future I will use your suggested solution.

aarishmaqsood avatar Apr 17 '24 01:04 aarishmaqsood

Great to hear @aarishmaqsood. Could you describe what fixed your problem? In case other others have the same issue, I can refer them here. Thanks.

npatki avatar Apr 17 '24 14:04 npatki

@npatki Here is the Colab link, where I have replicated the error and provided the solution as well. This problem occurs in version 1.5.0. Below are the code snippets that illustrate both the problem and the solution.

Reproducing the Error

!pip install sdv==1.5.0

import numpy as np
import pandas as pd
from sdv.metadata import SingleTableMetadata
from sdv.single_table import CTGANSynthesizer

# Generate sample data
num_rows = 100
num_cols = 20
data = {i+1: np.random.randint(0, 100, size=num_rows) for i in range(num_cols)}
df = pd.DataFrame(data)

# create metadata from the DataFrame
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)

# Initialize the synthesizer (this is where the error occurs)
synthesizer = CTGANSynthesizer(metadata=metadata)

Solution

# Convert column names to strings
df.columns = ['col_' + str(i) for i in range(1, len(df.columns) + 1)]

# Re-create metadata for the table with corrected column names
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)

# Initialize the synthesizer with corrected metadata
synthesizer = CTGANSynthesizer(metadata=metadata)

aarishmaqsood avatar Apr 18 '24 06:04 aarishmaqsood

Hi @aarishmaqsood, very much appreciate the detailed code and notebook.

Note that I have replicated this issue on the latest SDV (1.12.0) also. Here are a few things I discovered:

  1. The metadata auto-detection no longer works on SDV 1.12.0. I have filed an issue for it at SDV #1933
  2. The fit problem isn't isolated to CTGAN. None of the SDV synthesizers work with this type of data and all produce the same error. I have filed a generic issue at SDV #1935

Since we now have the above two issues filed in our main SDV library, I will mark this one as a duplicate.

In the meantime, for anyone else running into the issue, I suggest using @aarishmaqsood 's simple workaround that converts the column names from integers to strings.

Thanks all for helping uncover this. For any related discussion, please feel free to comment on either of the SDV issues linked above.

npatki avatar Apr 18 '24 22:04 npatki