visdat
visdat copied to clipboard
Blank plot with grid lines shown
I've encountered a couple of moderately sized dataframes (40 000 x 130) where vis_miss
and vis_dat
fail. A plain white background with grey grid lines is all that is plotted:
Dataset for this plot: https://drive.google.com/file/d/0B7688WPR38x2N2tiSk9HYWhVRTg/view?usp=sharing
Ooo. Weird. I'll check that out soon.
Hmm, looks like it is working now? Does this reproduce on your machine?
# system.time(loan <- read.csv("~/Downloads/CUSTOMER_LOAN.csv"))
loan <- readr::read_csv("~/Downloads/CUSTOMER_LOAN.csv")
#> Parsed with column specification:
#> cols(
#> .default = col_character(),
#> id = col_integer(),
#> member_id = col_integer(),
#> loan_amnt = col_integer(),
#> funded_amnt = col_integer(),
#> funded_amnt_inv = col_double(),
#> int_rate = col_double(),
#> installment = col_double(),
#> annual_inc = col_double(),
#> dti = col_double()
#> )
#> See spec(...) for full column specifications.
dim(loan)
#> [1] 42456 25
visdat::vis_miss(loan)
visdat::vis_dat(loan)
I'm assuming this is fine now, let me know if there are any problems @MilesMcBain !
I also have a blank plot with this dataframe: https://drive.google.com/open?id=1IfMHz2ElCklgXjBEZVoeUUmOo6106YMr
Just tried with another file: https://drive.google.com/open?id=1L11znzqmmXhkB7OsERI3lgDhMBXuw0vV and it works fine!
Just updated visdat package from 0.1.0 to 0.2.2.9200 a few hours ago. Selected only 2 variables (marc_153_a_ss, marc_084_a_ss) from the file linked above. visdat manages to plot up to ~32765 rows, it fails with a blank plot when i am trying out tibbles with more rows. @njtierney can you help?
Hi there @Phu2
For the moment it seems that this bug is related to processor speed and memory on a computer - so this is hard to generalise what the problem is and fix it.
Future approaches with plotting for visdat (see #65 and #59) will hopefully help with this, but this probably won't be in a release for at least the next 6 weeks.
In the interim, I would recommend downsampling your data using something like
library(visdat)
library(dplyr)
data %>%
sample_n(size = 1000) %>%
vis_dat()
to take a random sample of 1000 of the data and plot it
or look at the first 1000 rows like so:
data %>%
slice(1:1000) %>%
vis_dat()
Thanks! I have a lot of datasets with more than 50000 rows for which i like to plot the missing values. So downsampling ist not the right approach for me. I'll give visdat a try on another machine.
Sorry I can't be more help!
If you are interested in exploring missing data, you can also look at naniar
- which has more dedicated functions for exploring missing data.
No problem, i will have a look at naniar. Thank you for your work!
I'm not sure on a solution for this, so I am going to move it to another milestone, and then close it after that milestone is achieved (around August).