Wilco van Vorstenbosch
Wilco van Vorstenbosch
Hey @npatki, That is indeed what I meant. It's not a different dtype, but a similar idea as 'from_column', albeit with a different solution. I expect the same, performance-wise! Notice...
Dear Srini [@srinify ], First of all: I'm delighted that you are willing to help out. It is beyond my expectations, and I greatly appreciate it. This is my metadata:...
Dear Srini, I am indeed using a subset, for now. I'm randomly sampling 10.000 rows from the original dataset to test this package. How does GaussianCopulaSynthesizer compare to the CTGAN?...
Just to clarify @srinify : the values are not missing at random. Often, the variable was not relevant for a specific row because of a certain value for another variable....
In my synthetic data, there are no NaN values for the numeric columns. In the original data, whether a column has a NaN is not random. By the way, I...
Sorry for the late reply. I was busy with other work, but will be working on this topic for most of this week so I'll try to further clarify the...
Update @npatki @srinify , For my dataset, the same problem does not occur with the GausianCopula method. This method has some other issues that I'd like to tweak, but see...
> From looking at your visualization, my hunch is that your SDV synthesizer is producing a lot of data points at the mean -value that are actually supposed to be...
Dear @npatki , In fact I am loading the dataset directly from an SQL database. I did not think this would matter, much. Like I said earlier, the issue is...
Dear @npatki , I am pretty sure it has nothing to do with the dtypes. I just ran the CTGAN with a tiny sample, and this time the NaN's did...