oce
oce copied to clipboard
idea for enhanced [[ actions
At https://youtu.be/h4V2aOLpAlw I have put a little video explaining the gist of the idea; below is a brief overview. I wonder whether co-developers (@richardsc and @clayton33) or friends (e.g. @j-harbin) think this is a good idea.
Test object (must run from the oce
directory, in a clone of the "ad2cp" branch).
library(oce)
setwd("~/git/oce")
file <- "tests/testthat/local_data/ad2cp/S102791A002_Barrow_v2.ad2cp"
d <- read.oce(file)
I have two ideas for extending this.
Idea 1.
Make e.g.
d[["data/?"]]
return the names of the elements stored in the data slot. One of them happens to be (for this particular object) "burst", and it is a list containing other elements. So, a further idea would be that
d[["data/burst/?"]]
would name those elements. One of them is named "altimeter" and so, to extend this further,
d[["data/burst/altimeter/?"]]
would return the names of the elements in "altimeter".
In a nutshell, this means extending the "?"
mechanism of [[
to look deeper
within objects.
Idea 2.
Extend [[
to access within objects. For example, make it so that
d[["data/burst/altimeter/distance"]]
would be an alternative to
d[["data"]]$burst$altimeter$distance
I like idea 1 a lot. I also think idea 2 might be handy, e.g. in programming, where the string can be built up, instead of doing e.g.
d[["data"]]["burst"]["altimeter"]["distance"]
which is, of course, the equivalent.
I learned a few new things from that video, particularly the d[["?"]]
. Clearly I don't have enough @dankelley in my life as I'm falling behind on these cool R tricks.
I've been using a lot of historical CTD data recently, and sometimes there isn't any salinity data. Some code will complain and throw an error if there isn't, say, any salinity data in the data slot. I do a basic check, usually something like "salinity" %in% names(d@data)
, but something along the lines of "salinity" %in% d[["data/?"]]
looks nicer (if I does indeed return just the names... I'd need to rewatch the video. Its a "monday" after the long weekend and a few additional days off, so I'm a bit slow this morning, but even so, "salinity" %in% names(d[['data/?']])
would look nice as well). I also like it because I think it will help people who are learning R, and use these types of objects. I know I would be confused as to why you obtain, say, the names from some sort of oce
data object using @
instead of [[
.
I like both ideas. Idea 2 will be very handy. That's an easy +1 vote from me.
Thanks, @clayton33. I'll think some more about how to extend it.
The present [["?"]]
returns a list, as you can see below. So your test could be "salinity" %in% ctd[["?"]]$data
.
> library(oce)
Loading required package: gsw
> data(ctd)
> l<-ctd[["?"]]
> l$data
[1] "depth" "flag" "pressure" "salinity" "scan"
[6] "temperature" "timeS"
But [["?"]]
does more than just report what is in the file. It also reports other things that [[
can compute for you. Try the above example, but look at l$dataDerived
. Amongst other things, it will list Absolute Salinity
. This requires location for the computation. So, try doing
ctd@metadata$longitude <- NULL
to create an object without proper location information. Then, Absolute Salinity
will no longer be listed as accessible. This demonstrates that it tries pretty hard to figure things out for you.
But this figuring out is based on the assumption that both metadata
and data
are simple lists. With ad2cp data objects, data
is a list with items that are themselves lists (and some of which contain lists, for 3 levels of lists). that will make it a bit tricky to figure things out, but luckily there are no derivedData for velocity data. What I mean is that we have things like N^2 for CTD data, but no equivalent for velocity data ... users are left to compute things like depth-mean currents by themselves.
At https://youtu.be/WLHxy9fVYuU I have a video showing demonstrating how [["data/?"]]
etc works. I will now make it work without the ?
part, i.e. for data extraction, not just name extraction.
The video in https://github.com/dankelley/oce/issues/1992#issuecomment-1202999927 is a nice way of explaining this.
I'm conflicted on this because how I understood the benefit of [[
was that it extracted data regardless of where it is stored. I worry that with something like ctd[["data/animal/dog/bark"]]
it's moving backward in that you need to remember that bark
is stored in dog
, which is stored in animal
, which is stored in data
. In my opinion ctd[["data/animal/dog/bark"]]
is basically the same as ctd@data$animal$dog$bark
with the exception of one @
sign at the beginning and therefore we already have this functionality. For this reason, I'd be less inclined to add this.
On the other hand, this similar idea has come up with units, for example. Sometimes I do ctd[["units"]][['temperature']]
, which instead could be ctd[["units/temperature"]]
according to this method. I'll admit I've never liked needing to do ctd[["units"]][["temperature"]]
, but I later found out you could do ctd[["temperatureUnits"]]
instead. I'm wondering if we could instead do something like this (ie. ctd[["dogBark"]]
) or is there too many levels?
All of this to say... I'm not sure what my vote is.
At https://youtu.be/h4eseiDIZlc you can see a video I made to explain how to extract values from (not just the names of) things in oce objects, using the new [[
syntax. The test code is as below.
In the interests of time, and since I've already received some positive reviews of the idea, my proposal is to merge "issue1992" into "develop" on Thursday at noon. So, if @richardsc, @clayton33 or @j-harbin likes (or dislikes) the idea, please post a comment when you get the chance.
I am doing the build-test suite right now to see if I've broken anything. Our test suite really is quite extensive, so in the unlikely event that I broke existing [[
behaviour, I'll know in a few minutes.
# See branch 'issue1992'
library(oce)
data(ctd)
ctd@data$animal <- list(
dog=list(breed="mutt", age=3, sound="bark"),
cat=list(breed="who cares", age=1, sound="meow"))
ctd[["data/?"]]
str(ctd[["data"]])
ctd[["data/animal/?"]]
ctd[["data/animal"]]
ctd[["data/animal/dog/?"]]
str(ctd[["data/animal/dog/sound"]])
Sorry, @j-harbin I didn't see your comment before I posted the above. You're right, an advantage of [[
is that it can look in either metadata
or in data
.
None of the existing functionality should change if I add as I've done in the issue1992 branch. The idea is just to let folks write e.g.
d[["data/animal/dog/bark"]] # method 1
instead of
d@data$animal$dog$bark # method 2
or
d@data[["animal"]][["dog"]][["bark"]] # method 3
I agree that method 2 is as easy as method 1, but it cannot apply if the names of the things are computed from user interactions ... for that, only method 3 is available, and it is error-prone (for me, anyway).
As for your final comment about doing e.g. ctd[["dogBark"]]
, that cannot work, because there might be a variable named dogBark
.
Ah, makes sense! My reasoning for not adding it are as follows:
- To me, if a user wanted to extract
bark
fromdog
if they are going to take the time to dod[["data/animal/dog/bark"]]
, they will take the time to dod@data$animal$dog$bark
(ie.[[
is not necessary here, because we still need to know where things are stored and arguably the latter method has less typing). - Let's say the user wanted to extract the names of the animals (ie. dog/cat) but they know the names of the animals. The
d[["data/animal/?"]]
could be helpful to know what's stored in there, although this is the same functionality asd@data$animal
followed by clicking the tab button as an autofill. - Could the same argument for
d[["dogBark"]]
be applied tod[['temperatureUnit']]
as well? In theory, there could be a variable namedtemperatureUnit
, but we made it such that the user could extract it anyways
Reasons for adding it:
- I really like not needing to do
d@data[["animal"]][["dog"]][["bark"]]
to extract info that was added by users - DK has already written the code, and it does allow users more options to access information (without changing existing code)
- DK has expressed it's useful in ADCP work.
- DK is the boss :)
I'm neutral on this one!
Good points, @j-harbin. The main thing is not for interactive work. For interactive work, I would likely stick with typing d[["data"]]
and then pressing TAB to see what's there.
The thing with programming work is that you may not know what you want, so you cannot use the $
notation.
For example, imagine a shiny app to let you analyze stuff in an oce object: say, compute the mean value. Let's say there is a pulldown menu like
selectInput("itemToPlot", "Item to plot", choices=c("foo", "bar"))
When we respond, input$itemToPlot
will be either "foo"
or "bar"
so we might have code like
mean(object@data[[item$itemToPlot]])
That is, we need the [[
because we are referring it to name, and that name depends on user action ... the name is not hard-wired into the code. So, maybe we have submenus and submenus. The code would look like
mean(object@data[[input$A]][[input$B]][[input$C]]
which I think is uglier than
mean(object@data[[with(input, paste(A, B, C, sep="/"))]])
PS. I'm just typing code off the top of my head -- likely there are typos but the main point is simple: we cannot use the $
notation unless we know the name of the item at coding time. Whenever there is user action (a shiny thing, reading a file, checking a webpage), we have to use the [[
notation.
I plan to think about this early in the new year, either to (a) implement this for the next release (note the new tag) or (b) reject this.
I am going to reject this issue, and drop the idea. Why?
- I haven't worked on it over the several months since I opened it. That tells me that I didn't find it necessary in any of my work.
- The idea is a bit flawed. Essentially, it says that
x[["foo/bar"]]
would be equivalent tox[["foo"]]$bar
. But what if a user wanted a variable namedfoo/bar
? Although that seems odd as a variable name accessed with$
, it is perfectly fine for access with[
. I usually prefer the$
method, but others might prefer the[
method. Besides, the[
method is the only choice when the variable of interest is not known until runtime (e.g. in a shiny app where the user has chosen an item to analyze from a pulldown menu). Now, thisfoo/bar
example might seem silly, but what if somebody wanted to say care-of (which isc/o
in normal English) to designate a contact person?
PS. I just deleted the issue1992
branch, locally and on github.