Wilco van Vorstenbosch comments

Results 10 comments of


                                            Wilco van Vorstenbosch

Add a "mixed data" transformer to RDT, for use in CTGAN.

Hey @npatki, That is indeed what I meant. It's not a different dtype, but a similar idea as 'from_column', albeit with a different solution. I expect the same, performance-wise! Notice...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Dear Srini [@srinify ], First of all: I'm delighted that you are willing to help out. It is beyond my expectations, and I greatly appreciate it. This is my metadata:...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Dear Srini, I am indeed using a subset, for now. I'm randomly sampling 10.000 rows from the original dataset to test this package. How does GaussianCopulaSynthesizer compare to the CTGAN?...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Just to clarify @srinify : the values are not missing at random. Often, the variable was not relevant for a specific row because of a certain value for another variable....

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

In my synthetic data, there are no NaN values for the numeric columns. In the original data, whether a column has a NaN is not random. By the way, I...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Sorry for the late reply. I was busy with other work, but will be working on this topic for most of this week so I'll try to further clarify the...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Update @npatki @srinify , For my dataset, the same problem does not occur with the GausianCopula method. This method has some other issues that I'd like to tweak, but see...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

> From looking at your visualization, my hunch is that your SDV synthesizer is producing a lot of data points at the mean -value that are actually supposed to be...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Dear @npatki , In fact I am loading the dataset directly from an SQL database. I did not think this would matter, much. Like I said earlier, the issue is...

NaN values for numerical variables DISAPPEAR when using CTGANSynthesizer

Dear @npatki , I am pretty sure it has nothing to do with the dtypes. I just ran the CTGAN with a tiny sample, and this time the NaN's did...