datadigest
datadigest copied to clipboard
R Studio session runs out when reading big DF
Hi there,
Congrats about the package, I find it really useful!
Unfortunately, my R Studio session runs out when I am passing a big dataframe (>800000 observations and 26 variables) to datadigest. Even with smaller dataframes the performance is not very good when I use the filters.
datadigest::codebook(data = df1)
I am working on an R Studio instance on a company server and I should have enough memory both locally (16GB) and my quota on the server. Are the df passed to the RAM memory? Could this have anything to do with tibble format? Or with an older version of JavaScript installed on the server?
Thank you in advance for any help you can provide
Was hoping that this was an issue with too many marks being drawn on the page, but the fixes in progress for web-codebook v1.7 (also available in R via install_github('rhoinc/datadigest', ref="v1.1.0-dev")
) don't seem to help much.
My comp (16 gb RAM) managed to load a codebook for a 200000x150 labs data set (~200mb), but it was definitely slow ... RStudio crashed when I tried to run the codebook on a df that was 450000x150 (~400 mb).
Honestly, this should be doable. If the data set is loaded in to memory (which took a second, but worked ok) we should be able to summarize it. My new best guess is that there is some inefficient data handling happening in javascript that is causing the problem. I'll see if I can do some profiling and figure out where exactly the problem is occurring and will keep you posted on possible fixes.
If javascript debugging fails, then a more thorough refactor where data summaries are calculated in R and then passed to javascript for visualization (in a shiny app?) could be considered, but that would be a big chunk of work that may not happen soon.