models
models copied to clipboard
[FEA] generate data respecting the cardinalities in a given schema
🚀 Feature request
Currently generate_data() does not generate a dataset based on the cardinalities in a given schema. It rather generates a dataset by drawing samples from a log-normal distribution using np.random.lognormal() func. We should let user to have flexibility to be able to generate large cardinality datasets, which is not really happening due to current dist func. We should be able to generate a dataset with the cardinalities we want.
Motivation
Users might want to generate datasets for performance benchmarks and therefore generating a dataset with the required cardinalities is important.