countrycode icon indicating copy to clipboard operation
countrycode copied to clipboard

deal with tibbles and 1 column data frames gracefully

Open cjyetman opened this issue 7 years ago • 7 comments

We could add here something like...

if (mode(sourcevar) == "list" && length(sourcevar) == 1) {
  sourcevar <- sourcevar[[1]]
}

to essentially cast a single column tibble or data frame (or single element list) to a vector and avoid the following warning, i.e. sourcevar must be a character or numeric vector. This error often arises when users pass a tibble (e.g., from dplyr) instead of a column vector from a data.frame (i.e., my_tbl[, 2] vs. my_df[, 2] vs. my_tbl[[2]]).

Not 100% sure that's the best thing, so leaving this here simply as an idea.

cjyetman avatar Jan 06 '18 12:01 cjyetman

This seems like a good thing overall. Two questions:

  1. How can we be as explicit as possible w.r.t. tibbles?
  2. Is it desirable to output data in the same format that we got it in, or would it be better for countrycode to settle on a single output format that is well documented? If the former, do we also insert some code at then end to reformat?

vincentarelbundock avatar Jan 06 '18 13:01 vincentarelbundock

  1. "tbl_df" %in% class(sourcevar) maybe? not 100% explicit, but pretty close (I suppose if someone created a class on top of tbl_df and added other charcateristics this would still assume it's a tbl_df)

  2. I think it would be best to always return a single vector, which can easily be extended by the user for whatever structure they need. This proposal would simply work around an unintended passing of a sourcevar that doesn't quite conform to what's necessary, but can be obviously cast into something appropriate... rather than trying to open up a whole new realm of converting various different data structures.

cjyetman avatar Jan 06 '18 15:01 cjyetman

It could also throw a mandatory warning that tells the user that it's converting sourcevar so eventually they might learn that they should probably do it themselves first.

cjyetman avatar Jan 06 '18 15:01 cjyetman

That all sounds very good.

I've now dog-fooded the package a fair amount, and I'm thinking about preparing a release for CRAN using the major improvements we've implemented.

Do you think we should sneak this change in before I hit "GO"? Any other easy changes to make before release?

vincentarelbundock avatar Jan 06 '18 16:01 vincentarelbundock

There are a few regex improvements that I could probably get around to in the next week or so.

cjyetman avatar Jan 06 '18 16:01 cjyetman

Cool. No rush.

vincentarelbundock avatar Jan 06 '18 16:01 vincentarelbundock

My thinking on this has evolved.

I now believe that we should not convert inputs automagically, but that we should rather build in input checks and exit gracefully with a warning and instructions on common fixes.

vincentarelbundock avatar May 14 '20 13:05 vincentarelbundock