skimr icon indicating copy to clipboard operation
skimr copied to clipboard

Warnings when skimming

Open elinw opened this issue 7 years ago • 4 comments

Warnings that happen when skimming display. This is informative for many purposes, specifically for understanding what is in an unknown data set. However for other purposes it is undesirable.

One option would be to make a quiet mode that defaults to FALSE so that users could suppress the printing of warnings.

There may be other ideas also, so I'm marking this as for discussion.

elinw avatar Jan 04 '18 02:01 elinw

Independently of whether the warning messages get displayed, I think it would be worth adding a warnings count (or logical) to skim_df for each variable in the input frame.

I often get given data sets with hundreds to tens of thousands of variables, about which I know nothing. So I really need a process that I can tip the data into and get some characterisation of what it is without me being forced to manually eye-check some output. (I will do that anyway, but I can't guarantee to notice everything that I should.)

rgayler avatar Jan 04 '18 05:01 rgayler

That's an interesting concept, though I think complex to implement because of how the processing works. It could potentially be part of the summary. I think in the immediate term the warnings go automatically right below the type outputs and (at least for the ones we have seen) they don't mention the variable involved, just the statistic, which is really not that helpful.

So implementation proposals welcome if you want to dig into the code; I'll also be looking into it and put anything I come up with in this issue report.

elinw avatar Jan 05 '18 12:01 elinw

Putting this here for reference https://stackoverflow.com/questions/3903157/how-can-i-check-whether-a-function-call-results-in-a-warning/4947528#4947528 (second answer).

elinw avatar Jan 05 '18 20:01 elinw

In Hadley Wickham's readr package all the problem reports (per row of input) are put in a data frame that is returned as an attribute of the output data frame and accessd via the problems() function. However, I suspect that the problem reports are designed into the column parsers rather than captured from a standard warning message.

Ideally, whatever is developed for skimr ought to be applied automatically to user-supplied statistic functions - so no special programming is needed by the user to capture warnings and enhance them with the variable and statistic identifiers.

rgayler avatar Jan 05 '18 23:01 rgayler