Add setters/accessors to basic classes
For
- taxon name
- taxon id
- taxon rank
In
- [x] Taxon
- [x] Taxa
- [x] Hierarchy
- [ ] Hierarchies
This would have prevented #138 and will make some other things easier.
This seems like a good use of active bindings
Hey @sckott, what do you think the accessors should return? Currently, I am using active bindings. I can think of 3 options (using the Taxon class for these examples):
- A character vector. for example my_taxon_obj$name returns "My tax name". This is what I think most users will expect and use most often, but it does not make it obvious how to manipulate the TaxonName object (e.g. the database field).
- Another object. for example my_taxon_obj$name returns a TaxonName object. This makes the most sense in the context of object-oriented programming, but the user would have to wrap the result in
as.characterto get the vector that will be used most often. - Whatever is stored, either a character vector or an object. This is the easiest to implement, but will make other functions that use the accessors have to check which was returned, decreasing the value of the accessors.
i think it makes sense to store in a consistent way.
as what's returned, I lean towards No. 2, an object, but I haven't played with either to see what it feels like. Do you have a preference?
i think it makes sense to store in a consistent way.
Yea, that would simplify things. The original reason for making vectors and objects interchangeable was to reduce memory usage in large data sets. For taxon objects:
> object.size(taxon("my tsdfsdffssdf"))
320 bytes
> object.size("my tsdfsdffssdf")
104 bytes
Even so, I am starting to lean to think the simplicity might be worth it.
Do you have a preference?
I think 2 would be good as long as there is a good way of getting the vector information for taxa objects, because I think thats what will be used most often.
I am experimenting with storing the objects with the my_ prefix and making active bindings to access/set vectors:
> (x <- taxon(
+ name = taxon_name("Poa annua"),
+ rank = taxon_rank("species"),
+ id = taxon_id(93036)
+ ))
<Taxon>
name: Poa annua
rank: species
id: 93036
authority: none
> x$name
[1] "Poa annua"
> x$rank
[1] "species"
> x$id
[1] "93036"
> x$my_name
<TaxonName> Poa annua
database: none
> x$my_rank
<TaxonRank> species
database: none
> x$my_id
<TaxonId> 93036
database: none
> x$id <- "99999"
> x$my_id
<TaxonId> 99999
database: none
Heres taxa:
> catus <- taxon(
+ name = taxon_name("catus"),
+ rank = taxon_rank("species"),
+ id = taxon_id(9685)
+ )
> tigris <- taxon(
+ name = taxon_name("tigris"),
+ rank = taxon_rank("species"),
+ id = taxon_id(9696)
+ )
> sapiens <- taxon(
+ name = taxon_name("sapiens"),
+ rank = taxon_rank("species"),
+ id = taxon_id(9606)
+ )
> (x <- taxa(catus, tigris, sapiens))
<taxa>
no. taxa: 3
catus / species / 9685
tigris / species / 9696
sapiens / species / 9606
> x$names
[1] "catus" "tigris" "sapiens"
> x$names <- c("a", "b", "c")
> x
<taxa>
no. taxa: 3
a / species / 9685
b / species / 9696
c / species / 9606
> x$values
[[1]]
<Taxon>
name: a
rank: species
id: 9685
authority: none
[[2]]
<Taxon>
name: b
rank: species
id: 9696
authority: none
[[3]]
<Taxon>
name: c
rank: species
id: 9606
authority: none
One drawback of using R6 instead of S3 is the object is not directly iterable (e.g. lapply(x, func) vs lapply(x$values, func)).
I am rather indecisive at the moment. I want the following to be possible with whatever system we end up going with:
- Easily set/get vector values, with checks for validity for setters
- Easily set/get objects values, with checks for validity for setters
What do you think about:
- The source values are private and prefixed with
my_ - The active bindings return objects, but check the object type when setting and can accept vectors when setting. They basically pretend to be the source values.
- Remove the ability to store vectors instead of objects. Maybe we can add it back later if RAM becomes an issue. Vectors given to active bindings are converted to objects automatically.
- Use
taxon_names,taxon_ranks, andtaxon_idsS3 functions for all classes as vector-based getter/setters. Make them usable as setters( e.g.taxon_names<-). It might be a bit odd to usetaxon_namesto get/set oftaxonobjects, but having only one function name for everything will make it easy to use. However, I am concerned that thetaxon_namesfunction will be confused with thetaxon_nameclass. - Keep
taxaandhierarchiesas S3 to keep them iterable.
That plan sounds good @zachary-foster - i'll have a look at the changes as used in taxize dev branch to see if it creates any issues
@zachary-foster any idea of when this will get some attention?
Hi Scott, yea, I will work on it in the next few days.
awesome, thanks
Hi @sckott, I pushed some updates to the "eval" branch that have the new getter/setter/constructor code for TaxonName, TaxonRank, TaxonId, TaxonDatabase, Taxon, and Taxa. They all have S3 and R6 getter/setters. The S3 are wrappers for the R6. The S3 setters/constructors pass by value, like most R code, and the R6 setters/constructors pass by reference (if what they are passing is an R6 object).
Taxa also has all the standard indexing funcs like [<- and such.
The documentation has not been updated yet. Look at the new tests to see what can be done.
I am thinking about making Taxonomy inherit Taxa, so Taxonomy and Taxmap would get all of Taxa getters/setters.
Does all this seems reasonable? Any changes you would like to see?
Awesome, thanks. Will have a look soon, especially with an eye to see if taxize will require anything else
@zachary-foster will do next monday, leaving on vacation