visdat Blank plot with grid lines shown

Blank plot with grid lines shown

Open MilesMcBain opened this issue 7 years ago • 10 comments

I've encountered a couple of moderately sized dataframes (40 000 x 130) where vis_miss and vis_dat fail. A plain white background with grey grid lines is all that is plotted:

visdat_bug

Dataset for this plot: https://drive.google.com/file/d/0B7688WPR38x2N2tiSk9HYWhVRTg/view?usp=sharing

Oct 03 '16 23:10 MilesMcBain

Ooo. Weird. I'll check that out soon.

Oct 04 '16 03:10 njtierney

Hmm, looks like it is working now? Does this reproduce on your machine?


# system.time(loan <- read.csv("~/Downloads/CUSTOMER_LOAN.csv"))

loan <- readr::read_csv("~/Downloads/CUSTOMER_LOAN.csv")
#> Parsed with column specification:
#> cols(
#>   .default = col_character(),
#>   id = col_integer(),
#>   member_id = col_integer(),
#>   loan_amnt = col_integer(),
#>   funded_amnt = col_integer(),
#>   funded_amnt_inv = col_double(),
#>   int_rate = col_double(),
#>   installment = col_double(),
#>   annual_inc = col_double(),
#>   dti = col_double()
#> )
#> See spec(...) for full column specifications.

dim(loan)
#> [1] 42456    25

visdat::vis_miss(loan)

visdat::vis_dat(loan)

Dec 20 '16 04:12 njtierney

I'm assuming this is fine now, let me know if there are any problems @MilesMcBain !

Jan 08 '17 06:01 njtierney

I also have a blank plot with this dataframe: https://drive.google.com/open?id=1IfMHz2ElCklgXjBEZVoeUUmOo6106YMr bildschirmfoto vom 2018-03-20 00-36-46

Just tried with another file: https://drive.google.com/open?id=1L11znzqmmXhkB7OsERI3lgDhMBXuw0vV and it works fine!

Mar 19 '18 23:03 Phu2

Just updated visdat package from 0.1.0 to 0.2.2.9200 a few hours ago. Selected only 2 variables (marc_153_a_ss, marc_084_a_ss) from the file linked above. visdat manages to plot up to ~32765 rows, it fails with a blank plot when i am trying out tibbles with more rows. @njtierney can you help?

Mar 20 '18 22:03 Phu2

Hi there @Phu2

For the moment it seems that this bug is related to processor speed and memory on a computer - so this is hard to generalise what the problem is and fix it.

Future approaches with plotting for visdat (see #65 and #59) will hopefully help with this, but this probably won't be in a release for at least the next 6 weeks.

In the interim, I would recommend downsampling your data using something like

library(visdat)
library(dplyr)
data %>%
  sample_n(size = 1000) %>%
  vis_dat()

to take a random sample of 1000 of the data and plot it

or look at the first 1000 rows like so:

data %>%
  slice(1:1000) %>%
  vis_dat()

Mar 20 '18 22:03 njtierney

Thanks! I have a lot of datasets with more than 50000 rows for which i like to plot the missing values. So downsampling ist not the right approach for me. I'll give visdat a try on another machine.

Mar 20 '18 23:03 Phu2

Sorry I can't be more help!

If you are interested in exploring missing data, you can also look at naniar - which has more dedicated functions for exploring missing data.

Mar 20 '18 23:03 njtierney

No problem, i will have a look at naniar. Thank you for your work!

Mar 20 '18 23:03 Phu2

I'm not sure on a solution for this, so I am going to move it to another milestone, and then close it after that milestone is achieved (around August).

Jul 02 '18 04:07 njtierney

visdat visdat copied to clipboard

Blank plot with grid lines shown

visdat
visdat copied to clipboard