SDV
SDV copied to clipboard
Expose parameters from DeepEcho `PARSynthesizer` in SDV (eg. `data_types`)
Environment details
- SDV version:0.17.2
- Python version:3.9.13
- Operating System:Windows 10
Question description
I have a dataset where real features seems to follow NegativeBinomial distribution so per the paper
I want to force the loss during training for some features to use NegativeBinomial distribution
from deepecho.py
for field in self._output_columns:
dtype = timeseries_data[field].dtype
kind = dtype.kind
if kind in ('i', 'f'):
data_type = 'continuous'
elif kind in ('O', 'b'):
data_type = 'categorical'
else:
raise ValueError(f'Unsupported dtype {dtype}')
all feature will be continuous , so while the training
for key, props in self._data_map.items():
if props['type'] in ['continuous', 'timestamp']:
mu_idx, sigma_idx, missing_idx = props['indices']
mu = Y_padded[:, :, mu_idx]
sigma = torch.nn.functional.softplus(Y_padded[:, :, sigma_idx])
missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])
for i in range(batch_size):
dist = torch.distributions.normal.Normal(
mu[:seq_len[i], i], sigma[:seq_len[i], i])
log_likelihood += torch.sum(dist.log_prob(X_padded[-seq_len[i]:, i, mu_idx]))
p_true = X_padded[:seq_len[i], i, missing_idx]
p_pred = missing[:seq_len[i], i]
log_likelihood += torch.sum(p_true * p_pred)
log_likelihood += torch.sum((1.0 - p_true) * torch.log(
1.0 - torch.exp(p_pred)))
elif props['type'] in ['count']:
r_idx, p_idx, missing_idx = props['indices']
r = torch.nn.functional.softplus(Y_padded[:, :, r_idx]) * props['range']
p = torch.sigmoid(Y_padded[:, :, p_idx])
x = X_padded[:, :, r_idx] * props['range']
missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])
for i in range(batch_size):
dist = torch.distributions.negative_binomial.NegativeBinomial(
r[:seq_len[i], i], p[:seq_len[i], i], validate_args=False)
log_likelihood += torch.sum(dist.log_prob(x[:seq_len[i], i]))
p_true = X_padded[:seq_len[i], i, missing_idx]
p_pred = missing[:seq_len[i], i]
log_likelihood += torch.sum(p_true * p_pred)
log_likelihood += torch.sum((1.0 - p_true) * torch.log(
1.0 - torch.exp(p_pred)))
all my features will be modeled as gaussian , which is not correct for my case
seems I found a workaround with using PAR models from deepecho as standalone library not from sdv
https://github.com/sdv-dev/DeepEcho#standalone-usage
So the question now , any intentions to support data_types
dict to be passed to PAR models from sdv ?
Hi @Mohamed209, glad you found that the DeepEcho
library had the settings you needed.
How about we turn this issue into a feature request for supporting all the PAR data_types
through the SDV library? While this is not currently on our roadmap, this type of feedback will help us prioritize it in the future.
(Following the previous comment, I've re-titled this and marked it as a feature request)