taxa icon indicating copy to clipboard operation
taxa copied to clipboard

Add setters/accessors to basic classes

Open zachary-foster opened this issue 8 years ago • 11 comments

For

  • taxon name
  • taxon id
  • taxon rank

In

  • [x] Taxon
  • [x] Taxa
  • [x] Hierarchy
  • [ ] Hierarchies

This would have prevented #138 and will make some other things easier.

zachary-foster avatar Mar 19 '18 06:03 zachary-foster

This seems like a good use of active bindings

zachary-foster avatar Mar 21 '18 16:03 zachary-foster

Hey @sckott, what do you think the accessors should return? Currently, I am using active bindings. I can think of 3 options (using the Taxon class for these examples):

  1. A character vector. for example my_taxon_obj$name returns "My tax name". This is what I think most users will expect and use most often, but it does not make it obvious how to manipulate the TaxonName object (e.g. the database field).
  2. Another object. for example my_taxon_obj$name returns a TaxonName object. This makes the most sense in the context of object-oriented programming, but the user would have to wrap the result in as.character to get the vector that will be used most often.
  3. Whatever is stored, either a character vector or an object. This is the easiest to implement, but will make other functions that use the accessors have to check which was returned, decreasing the value of the accessors.

zachary-foster avatar Mar 21 '18 19:03 zachary-foster

i think it makes sense to store in a consistent way.

as what's returned, I lean towards No. 2, an object, but I haven't played with either to see what it feels like. Do you have a preference?

sckott avatar Mar 22 '18 04:03 sckott

i think it makes sense to store in a consistent way.

Yea, that would simplify things. The original reason for making vectors and objects interchangeable was to reduce memory usage in large data sets. For taxon objects:

> object.size(taxon("my tsdfsdffssdf"))
320 bytes
> object.size("my tsdfsdffssdf")
104 bytes

Even so, I am starting to lean to think the simplicity might be worth it.

Do you have a preference?

I think 2 would be good as long as there is a good way of getting the vector information for taxa objects, because I think thats what will be used most often.

I am experimenting with storing the objects with the my_ prefix and making active bindings to access/set vectors:

> (x <- taxon(
+     name = taxon_name("Poa annua"),
+     rank = taxon_rank("species"),
+     id = taxon_id(93036)
+ ))
<Taxon>
  name: Poa annua
  rank: species
  id: 93036
  authority: none
> x$name
[1] "Poa annua"
> x$rank
[1] "species"
> x$id
[1] "93036"
> x$my_name
<TaxonName> Poa annua
  database: none
> x$my_rank
<TaxonRank> species
  database: none
> x$my_id
<TaxonId> 93036
  database: none
> x$id <- "99999"
> x$my_id
<TaxonId> 99999
  database: none

Heres taxa:

> catus <- taxon(
+     name = taxon_name("catus"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9685)
+ )
> tigris <- taxon(
+     name = taxon_name("tigris"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9696)
+ )
> sapiens <- taxon(
+     name = taxon_name("sapiens"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9606)
+ )
> (x <- taxa(catus, tigris, sapiens))
<taxa> 
  no. taxa:  3 
  catus / species / 9685 
  tigris / species / 9696 
  sapiens / species / 9606 
> x$names
[1] "catus"   "tigris"  "sapiens"
> x$names <- c("a", "b", "c")
> x
<taxa> 
  no. taxa:  3 
  a / species / 9685 
  b / species / 9696 
  c / species / 9606 
> x$values
[[1]]
<Taxon>
  name: a
  rank: species
  id: 9685
  authority: none

[[2]]
<Taxon>
  name: b
  rank: species
  id: 9696
  authority: none

[[3]]
<Taxon>
  name: c
  rank: species
  id: 9606
  authority: none

One drawback of using R6 instead of S3 is the object is not directly iterable (e.g. lapply(x, func) vs lapply(x$values, func)).

I am rather indecisive at the moment. I want the following to be possible with whatever system we end up going with:

  • Easily set/get vector values, with checks for validity for setters
  • Easily set/get objects values, with checks for validity for setters

What do you think about:

  • The source values are private and prefixed with my_
  • The active bindings return objects, but check the object type when setting and can accept vectors when setting. They basically pretend to be the source values.
  • Remove the ability to store vectors instead of objects. Maybe we can add it back later if RAM becomes an issue. Vectors given to active bindings are converted to objects automatically.
  • Use taxon_names, taxon_ranks, and taxon_ids S3 functions for all classes as vector-based getter/setters. Make them usable as setters( e.g. taxon_names<-). It might be a bit odd to use taxon_names to get/set of taxon objects, but having only one function name for everything will make it easy to use. However, I am concerned that the taxon_names function will be confused with the taxon_name class.
  • Keep taxa and hierarchies as S3 to keep them iterable.

zachary-foster avatar Mar 22 '18 09:03 zachary-foster

That plan sounds good @zachary-foster - i'll have a look at the changes as used in taxize dev branch to see if it creates any issues

sckott avatar Mar 26 '18 23:03 sckott

@zachary-foster any idea of when this will get some attention?

sckott avatar Feb 20 '19 17:02 sckott

Hi Scott, yea, I will work on it in the next few days.

zachary-foster avatar Feb 20 '19 18:02 zachary-foster

awesome, thanks

sckott avatar Feb 20 '19 20:02 sckott

Hi @sckott, I pushed some updates to the "eval" branch that have the new getter/setter/constructor code for TaxonName, TaxonRank, TaxonId, TaxonDatabase, Taxon, and Taxa. They all have S3 and R6 getter/setters. The S3 are wrappers for the R6. The S3 setters/constructors pass by value, like most R code, and the R6 setters/constructors pass by reference (if what they are passing is an R6 object).

Taxa also has all the standard indexing funcs like [<- and such.

The documentation has not been updated yet. Look at the new tests to see what can be done.

I am thinking about making Taxonomy inherit Taxa, so Taxonomy and Taxmap would get all of Taxa getters/setters.

Does all this seems reasonable? Any changes you would like to see?

zachary-foster avatar Mar 09 '19 00:03 zachary-foster

Awesome, thanks. Will have a look soon, especially with an eye to see if taxize will require anything else

sckott avatar Mar 11 '19 18:03 sckott

@zachary-foster will do next monday, leaving on vacation

sckott avatar Mar 13 '19 15:03 sckott