RCall.jl icon indicating copy to clipboard operation
RCall.jl copied to clipboard

Both DataValues and RCall export isna

Open davidanthoff opened this issue 5 years ago • 4 comments

I just realized that on julia 1.0, both DataValues.jl and RCall.jl are exporting a isna function. That is not ideal, because I would think that both packages will end up being loaded simultaneously quite often and then folks can't properly use the function...

Here is some background info: DataValues.jl uses NA as its value for missing data. On julia 0.6 and earlier, it used isnull for checking for missing data. I never liked that and always thought it should be isna, but then DataArrays.jl exported isna for a long time and I didn't want to take a dependency on it. Plus, after the transition to Missing, DataArrays.jl of course also no longer exported isna. At that point I still didn't switch over to isna because I couldn't be sure what version of DataArrays.jl folks would actually have, and I didn't want any conflicts to arise. But the julia 1.0 version of DataValues.jl now makes isna the main and only way to check for missing values. I had thought that no other package was still using it, so I thought it would be unproblematic to "claim" that name for DataValue. I wasn't aware that RCall.jl was also using it.

Not sure what the best solution is... Here are some options: 1) RCall.jl uses something else (maybe ismissing?) instead, 2) DataValues.jl uses something else, 3) RCall.jl takes a dependency on DataValues.jl and we all use the same isna function or 4) we create a common root package that only has a isna definition in it, both RCall.jl and DataValues.jl take a dep on that and we again share a common function...

I don't know how problematic 1) would be, obviously from my point of view it would solve the problem, but I could entirely understand if that is not an option. Same for 3). I think I'm 99% certain that 2) is not really an option from the Queryverse point of view (a fair number of reasons, happy to explain, if anyone is interested). 4) seems a bit silly, but on the other hand it would solve this issue without anyone having to really make any difficult changes. Maybe there are other options out there as well?

CC @nalimilan

davidanthoff avatar Oct 08 '18 02:10 davidanthoff

isna corresponds directly to the R is.na function, so I don't think renaming is really an option.

To be honest, I don't think the status quo (forcing users to qualify their usage) is that big of a problem.

simonbyrne avatar Oct 08 '18 03:10 simonbyrne

You know my position already: move DataValues to ismissing. :-)

nalimilan avatar Oct 08 '18 15:10 nalimilan

I probably put the isna function in RCall initially because I was duplicating the R API in Julia code and it was before Missing was part of Julia. It would be quite sensible for RCall to define methods for ismissing instead of isna. Of course, @randy3k is the most knowledgeable person to judge that.

Among the proposed solutions my least favorite is having RCall depend on DataValues. I hope that the Julia Data ecosystem will move towards using the standard Missing type and Union{Missing,BitsType} for missing data representation exclusively and not propagate multiple standards. We have enough work to do as it is; let's not compound our problems.

dmbates avatar Oct 08 '18 15:10 dmbates

I am with @simonbyrne. isna is the most natural function name for checking R's NA value. And we may want to distinguish R's NA ~Verdi~ versus (damn auto correction) Julia “NA” type Missing. I am also curious why DataValues needs a new missing data representation at all?

Unless there are enough reasons to support DataValues as the default coninical Julia missing type for R objects, I don’t see depedency of DataValues.jl is a viable option.

randy3k avatar Oct 09 '18 04:10 randy3k