Correlated data from multiple different distributions
Thank you for this excellent package.
I have a dataset which consists of 5 continuous variables and 5 categorical variables. I can generate a correlation matrix for this data set*, along with means/SDs of the continuous and counts for the categorical variables.
At the moment it looks like I can build 2 different simstudy datasets, one using the correlations between the continuous variables, their means and SDs, and another using the same technique for the categorical variables. However, I don't see how I can make use of the correlations between the continuous and categorical variables to generate a complete dataset that
It may be that I am not using simstudy correctly in whcih case I would appreciate any advice on how I can do what I have described above.
*forgive my stats naivety if this is not a valid thing to do
Thanks for your note - it would be helpful if you shared the code that you are currently using.
Yes you are right, sorry for not following the guidance! Here is something that might help illustrate:
library("simstudy")
cont_data = mtcars[,-which(names(mtcars) %in% c("cyl","vs","am","gear","carb"))]
cols = colnames(cont_data)
corrs = cor(x=cont_data)
means = colMeans(x=cont_data)
sds = apply(cont_data,2,sd)
dd <- genCorData(n = 40, mu = means, sigma = sds, corMatrix = corrs, cnames = cols)
I can use simstudy to build a dataset that captures the properties and relationships between the continuous variables. But now I am stuck as to how I would apply this to the categorical and binary columns. It feels like I need to specify everything in one go to capture the relationships between all the variables, but I don't how I can do this with mixed distribution types.
Any thoughts would be greatly appreciated
simstudy can accommodate generating correlated data from different distributions using the function genCorFlex (see here). However, the distributions are currently limited to "binary", "poisson", "gamma", "normal", and "uniform" distributions. There is currently also functionality to generate correlated ordinal (categorical) data using genOrdCat, but this has not been integrated with other types of distributions.
OK that is great, thank you - I missed genCorFlex in the vignettes. Are there plans to add ordinal data to genCorFlex? I guess in the meantime one could convert the ordinal variables to binaries?