REaLTabFormer icon indicating copy to clipboard operation
REaLTabFormer copied to clipboard

Could order of columns affect performance of synthetic data quality?

Open efstathios-chatzikyriakidis opened this issue 11 months ago • 2 comments

Hi @avsolatorio!

Could order of columns (first categorical, then numerical/datetime) or the opposite (first numerical/datetime, then categorical) could affect quality of synthetic data? Furthermore in categorical could be ordered more by cardinality. Correlations exist on all columns and I am thinking if putting first the categoricals or not, or sorting categoricals by ascending or descending will allow better learning or not.

Thanks!

I have done some tests and it seems that it doesn't matter. Similar results observed for each possible case of first or last categorical columns and with increasing and decreasing cardinality as well.

echatzikyriakidis avatar Mar 10 '24 00:03 echatzikyriakidis

Can be closed

echatzikyriakidis avatar Mar 10 '24 00:03 echatzikyriakidis