SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Conditional sampling with empty conditions

Open fealho opened this issue 4 years ago • 2 comments

Problem Description

When using conditional sampling without passing any conditions an unreadable error is thrown.

Expected behavior

Either it should simply return any samples, or it should throw an error saying no conditions were passed.

Additional context

The code below shows an example of this happening:

data = pd.DataFrame({
    "column1": [1.0, 0.5, 2.5] * 10,
    "column2": ["a", "b", "c"] * 10
})

model = CTGAN(epochs=1)
model.fit(data)
conditions = pd.DataFrame({
    "column2": []
})
sampled = model.sample(conditions=conditions)

fealho avatar Mar 11 '21 23:03 fealho

I lean towards just returning a DataFrame with no rows in it rather than throwing an error.

csala avatar Mar 16 '21 17:03 csala

The new API exposes a sample_remaining_columns method for this use case. If you pass in an empty DataFrame, a ValueError is thrown:

from sdv.demo import load_tabular_demo
from sdv.tabular import GaussianCopula
import pandas as pd

data = load_tabular_demo('student_placements')
model = GaussianCopula()
model.fit(data)

conditions = pd.DataFrame({
    'gender': [],
})
model.sample_remaining_columns(conditions)

Output:

ValueError: No objects to concatenate

An error seems appropriate since it's not the intended usage but it can be more descriptive. We can change it to:

Error: Data is empty. Please input a DataFrame with 1 or more rows.

npatki avatar Jun 10 '22 19:06 npatki