SDV icon indicating copy to clipboard operation
SDV copied to clipboard

Expose parameters from DeepEcho `PARSynthesizer` in SDV (eg. `data_types`)

Open Mohamed209 opened this issue 2 years ago • 3 comments

Environment details

  • SDV version:0.17.2
  • Python version:3.9.13
  • Operating System:Windows 10

Question description

I have a dataset where real features seems to follow NegativeBinomial distribution so per the paper

Capture

I want to force the loss during training for some features to use NegativeBinomial distribution

from deepecho.py

        for field in self._output_columns:
            dtype = timeseries_data[field].dtype
            kind = dtype.kind
            if kind in ('i', 'f'):
                data_type = 'continuous'
            elif kind in ('O', 'b'):
                data_type = 'categorical'
            else:
                raise ValueError(f'Unsupported dtype {dtype}')

all feature will be continuous , so while the training

for key, props in self._data_map.items():
            if props['type'] in ['continuous', 'timestamp']:
                mu_idx, sigma_idx, missing_idx = props['indices']
                mu = Y_padded[:, :, mu_idx]
                sigma = torch.nn.functional.softplus(Y_padded[:, :, sigma_idx])
                missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])

                for i in range(batch_size):
                    dist = torch.distributions.normal.Normal(
                        mu[:seq_len[i], i], sigma[:seq_len[i], i])
                    log_likelihood += torch.sum(dist.log_prob(X_padded[-seq_len[i]:, i, mu_idx]))

                    p_true = X_padded[:seq_len[i], i, missing_idx]
                    p_pred = missing[:seq_len[i], i]
                    log_likelihood += torch.sum(p_true * p_pred)
                    log_likelihood += torch.sum((1.0 - p_true) * torch.log(
                        1.0 - torch.exp(p_pred)))

            elif props['type'] in ['count']:
                r_idx, p_idx, missing_idx = props['indices']
                r = torch.nn.functional.softplus(Y_padded[:, :, r_idx]) * props['range']
                p = torch.sigmoid(Y_padded[:, :, p_idx])
                x = X_padded[:, :, r_idx] * props['range']
                missing = torch.nn.LogSigmoid()(Y_padded[:, :, missing_idx])

                for i in range(batch_size):
                    dist = torch.distributions.negative_binomial.NegativeBinomial(
                        r[:seq_len[i], i], p[:seq_len[i], i], validate_args=False)
                    log_likelihood += torch.sum(dist.log_prob(x[:seq_len[i], i]))

                    p_true = X_padded[:seq_len[i], i, missing_idx]
                    p_pred = missing[:seq_len[i], i]
                    log_likelihood += torch.sum(p_true * p_pred)
                    log_likelihood += torch.sum((1.0 - p_true) * torch.log(
                        1.0 - torch.exp(p_pred)))

all my features will be modeled as gaussian , which is not correct for my case

Mohamed209 avatar Jan 08 '23 12:01 Mohamed209

seems I found a workaround with using PAR models from deepecho as standalone library not from sdv https://github.com/sdv-dev/DeepEcho#standalone-usage So the question now , any intentions to support data_types dict to be passed to PAR models from sdv ?

Mohamed209 avatar Jan 08 '23 13:01 Mohamed209

Hi @Mohamed209, glad you found that the DeepEcho library had the settings you needed.

How about we turn this issue into a feature request for supporting all the PAR data_types through the SDV library? While this is not currently on our roadmap, this type of feedback will help us prioritize it in the future.

npatki avatar Jan 11 '23 20:01 npatki

(Following the previous comment, I've re-titled this and marked it as a feature request)

npatki avatar Jan 23 '23 17:01 npatki