[FEA] generate data respecting the cardinalities in a given schema

Open rnyak opened this issue 3 years ago • 0 comments

🚀 Feature request

Currently generate_data() does not generate a dataset based on the cardinalities in a given schema. It rather generates a dataset by drawing samples from a log-normal distribution using np.random.lognormal() func. We should let user to have flexibility to be able to generate large cardinality datasets, which is not really happening due to current dist func. We should be able to generate a dataset with the cardinalities we want.

Motivation

Users might want to generate datasets for performance benchmarks and therefore generating a dataset with the required cardinalities is important.

Jun 01 '22 19:06 rnyak