oce icon indicating copy to clipboard operation
oce copied to clipboard

idea for enhanced [[ actions

Open dankelley opened this issue 1 year ago • 8 comments

At https://youtu.be/h4V2aOLpAlw I have put a little video explaining the gist of the idea; below is a brief overview. I wonder whether co-developers (@richardsc and @clayton33) or friends (e.g. @j-harbin) think this is a good idea.

Test object (must run from the oce directory, in a clone of the "ad2cp" branch).

library(oce)
setwd("~/git/oce")
file <- "tests/testthat/local_data/ad2cp/S102791A002_Barrow_v2.ad2cp"
d <- read.oce(file)

I have two ideas for extending this.

Idea 1.

Make e.g.

d[["data/?"]]

return the names of the elements stored in the data slot. One of them happens to be (for this particular object) "burst", and it is a list containing other elements. So, a further idea would be that

d[["data/burst/?"]]

would name those elements. One of them is named "altimeter" and so, to extend this further,

d[["data/burst/altimeter/?"]]

would return the names of the elements in "altimeter".

In a nutshell, this means extending the "?" mechanism of [[ to look deeper within objects.

Idea 2.

Extend [[ to access within objects. For example, make it so that

d[["data/burst/altimeter/distance"]]

would be an alternative to

d[["data"]]$burst$altimeter$distance

I like idea 1 a lot. I also think idea 2 might be handy, e.g. in programming, where the string can be built up, instead of doing e.g.

d[["data"]]["burst"]["altimeter"]["distance"]

which is, of course, the equivalent.

dankelley avatar Aug 02 '22 10:08 dankelley

I learned a few new things from that video, particularly the d[["?"]]. Clearly I don't have enough @dankelley in my life as I'm falling behind on these cool R tricks.

I've been using a lot of historical CTD data recently, and sometimes there isn't any salinity data. Some code will complain and throw an error if there isn't, say, any salinity data in the data slot. I do a basic check, usually something like "salinity" %in% names(d@data) , but something along the lines of "salinity" %in% d[["data/?"]] looks nicer (if I does indeed return just the names... I'd need to rewatch the video. Its a "monday" after the long weekend and a few additional days off, so I'm a bit slow this morning, but even so, "salinity" %in% names(d[['data/?']]) would look nice as well). I also like it because I think it will help people who are learning R, and use these types of objects. I know I would be confused as to why you obtain, say, the names from some sort of oce data object using @ instead of [[.

I like both ideas. Idea 2 will be very handy. That's an easy +1 vote from me.

clayton33 avatar Aug 02 '22 13:08 clayton33

Thanks, @clayton33. I'll think some more about how to extend it.

The present [["?"]] returns a list, as you can see below. So your test could be "salinity" %in% ctd[["?"]]$data.

> library(oce)
Loading required package: gsw
> data(ctd)
> l<-ctd[["?"]]
> l$data
[1] "depth"       "flag"        "pressure"    "salinity"    "scan"
[6] "temperature" "timeS"

But [["?"]] does more than just report what is in the file. It also reports other things that [[ can compute for you. Try the above example, but look at l$dataDerived. Amongst other things, it will list Absolute Salinity. This requires location for the computation. So, try doing

ctd@metadata$longitude <- NULL

to create an object without proper location information. Then, Absolute Salinity will no longer be listed as accessible. This demonstrates that it tries pretty hard to figure things out for you.

But this figuring out is based on the assumption that both metadata and data are simple lists. With ad2cp data objects, data is a list with items that are themselves lists (and some of which contain lists, for 3 levels of lists). that will make it a bit tricky to figure things out, but luckily there are no derivedData for velocity data. What I mean is that we have things like N^2 for CTD data, but no equivalent for velocity data ... users are left to compute things like depth-mean currents by themselves.

dankelley avatar Aug 02 '22 15:08 dankelley

At https://youtu.be/WLHxy9fVYuU I have a video showing demonstrating how [["data/?"]] etc works. I will now make it work without the ? part, i.e. for data extraction, not just name extraction.

dankelley avatar Aug 02 '22 17:08 dankelley

The video in https://github.com/dankelley/oce/issues/1992#issuecomment-1202999927 is a nice way of explaining this.

I'm conflicted on this because how I understood the benefit of [[ was that it extracted data regardless of where it is stored. I worry that with something like ctd[["data/animal/dog/bark"]] it's moving backward in that you need to remember that bark is stored in dog, which is stored in animal, which is stored in data. In my opinion ctd[["data/animal/dog/bark"]] is basically the same as ctd@data$animal$dog$bark with the exception of one @ sign at the beginning and therefore we already have this functionality. For this reason, I'd be less inclined to add this.

On the other hand, this similar idea has come up with units, for example. Sometimes I do ctd[["units"]][['temperature']], which instead could be ctd[["units/temperature"]] according to this method. I'll admit I've never liked needing to do ctd[["units"]][["temperature"]], but I later found out you could do ctd[["temperatureUnits"]] instead. I'm wondering if we could instead do something like this (ie. ctd[["dogBark"]]) or is there too many levels?

All of this to say... I'm not sure what my vote is.

j-harbin avatar Aug 02 '22 17:08 j-harbin

At https://youtu.be/h4eseiDIZlc you can see a video I made to explain how to extract values from (not just the names of) things in oce objects, using the new [[ syntax. The test code is as below.

In the interests of time, and since I've already received some positive reviews of the idea, my proposal is to merge "issue1992" into "develop" on Thursday at noon. So, if @richardsc, @clayton33 or @j-harbin likes (or dislikes) the idea, please post a comment when you get the chance.

I am doing the build-test suite right now to see if I've broken anything. Our test suite really is quite extensive, so in the unlikely event that I broke existing [[ behaviour, I'll know in a few minutes.

# See branch 'issue1992'
library(oce)
data(ctd)
ctd@data$animal <- list(
    dog=list(breed="mutt", age=3, sound="bark"),
    cat=list(breed="who cares", age=1, sound="meow"))

ctd[["data/?"]]
str(ctd[["data"]])

ctd[["data/animal/?"]]
ctd[["data/animal"]]


ctd[["data/animal/dog/?"]]
str(ctd[["data/animal/dog/sound"]])

dankelley avatar Aug 02 '22 17:08 dankelley

Sorry, @j-harbin I didn't see your comment before I posted the above. You're right, an advantage of [[ is that it can look in either metadata or in data.

None of the existing functionality should change if I add as I've done in the issue1992 branch. The idea is just to let folks write e.g.

d[["data/animal/dog/bark"]] # method 1

instead of

d@data$animal$dog$bark # method 2

or

d@data[["animal"]][["dog"]][["bark"]] # method 3

I agree that method 2 is as easy as method 1, but it cannot apply if the names of the things are computed from user interactions ... for that, only method 3 is available, and it is error-prone (for me, anyway).

As for your final comment about doing e.g. ctd[["dogBark"]], that cannot work, because there might be a variable named dogBark.

dankelley avatar Aug 02 '22 17:08 dankelley

Ah, makes sense! My reasoning for not adding it are as follows:

  1. To me, if a user wanted to extract bark from dog if they are going to take the time to do d[["data/animal/dog/bark"]], they will take the time to do d@data$animal$dog$bark (ie. [[ is not necessary here, because we still need to know where things are stored and arguably the latter method has less typing).
  2. Let's say the user wanted to extract the names of the animals (ie. dog/cat) but they know the names of the animals. The d[["data/animal/?"]] could be helpful to know what's stored in there, although this is the same functionality as d@data$animal followed by clicking the tab button as an autofill.
  3. Could the same argument for d[["dogBark"]] be applied to d[['temperatureUnit']] as well? In theory, there could be a variable named temperatureUnit, but we made it such that the user could extract it anyways

Reasons for adding it:

  1. I really like not needing to do d@data[["animal"]][["dog"]][["bark"]] to extract info that was added by users
  2. DK has already written the code, and it does allow users more options to access information (without changing existing code)
  3. DK has expressed it's useful in ADCP work.
  4. DK is the boss :)

I'm neutral on this one!

j-harbin avatar Aug 02 '22 18:08 j-harbin

Good points, @j-harbin. The main thing is not for interactive work. For interactive work, I would likely stick with typing d[["data"]] and then pressing TAB to see what's there.

The thing with programming work is that you may not know what you want, so you cannot use the $ notation.

For example, imagine a shiny app to let you analyze stuff in an oce object: say, compute the mean value. Let's say there is a pulldown menu like

selectInput("itemToPlot", "Item to plot", choices=c("foo", "bar"))

When we respond, input$itemToPlot will be either "foo" or "bar" so we might have code like

mean(object@data[[item$itemToPlot]])

That is, we need the [[ because we are referring it to name, and that name depends on user action ... the name is not hard-wired into the code. So, maybe we have submenus and submenus. The code would look like

mean(object@data[[input$A]][[input$B]][[input$C]]

which I think is uglier than

mean(object@data[[with(input, paste(A, B, C, sep="/"))]])

PS. I'm just typing code off the top of my head -- likely there are typos but the main point is simple: we cannot use the $ notation unless we know the name of the item at coding time. Whenever there is user action (a shiny thing, reading a file, checking a webpage), we have to use the [[ notation.

dankelley avatar Aug 02 '22 19:08 dankelley

I plan to think about this early in the new year, either to (a) implement this for the next release (note the new tag) or (b) reject this.

dankelley avatar Dec 30 '22 18:12 dankelley

I am going to reject this issue, and drop the idea. Why?

  1. I haven't worked on it over the several months since I opened it. That tells me that I didn't find it necessary in any of my work.
  2. The idea is a bit flawed. Essentially, it says that x[["foo/bar"]] would be equivalent to x[["foo"]]$bar. But what if a user wanted a variable named foo/bar? Although that seems odd as a variable name accessed with $, it is perfectly fine for access with [. I usually prefer the $ method, but others might prefer the [ method. Besides, the [ method is the only choice when the variable of interest is not known until runtime (e.g. in a shiny app where the user has chosen an item to analyze from a pulldown menu). Now, this foo/bar example might seem silly, but what if somebody wanted to say care-of (which is c/o in normal English) to designate a contact person?

dankelley avatar Jan 01 '23 17:01 dankelley

PS. I just deleted the issue1992 branch, locally and on github.

dankelley avatar Jan 01 '23 17:01 dankelley