Relationship with `taxlist`
Just curious, what is the relationship between taxa and kamapu/taxlist? The focus of the two packages seems somewhat different. Perhaps we could say taxa is focused on beta taxonomy and taxlist on alpha taxonomy. Is there any potential for connecting the two packages (e.g., using taxlist inside taxa or adding functions to convert between the two systems)?
I'm currently not involved in either project (apart from a role as a public reviewer), but I may need a library for handling taxonomy in the near future and I don't like making choices.
Good question @arendsee
I still have not thought much about how the two compare. A few thoughts after having a quick look:
- not sure about the beta vs. alpha thing, @zachary-foster ?
- The S3/R6 approach here is quite different from the S4 approach in taxlist (there's strong feelings on both sides I imagine). if we wanted to provide fxns here for converting taxa format to taxlist, we might have to do S4 stuff, which I'd rather avoid
- The data manipulation approach is pretty different, taking a dplyr-esque approach here and more base R approach there
- taxa will be very battle tested soon, cause zach's using in metacoder and we'll be using taxa in taxize to handle packaging data as it moves around in a taxize workflow
@zachary-foster thoughts?
what is the relationship between taxa and kamapu/taxlist?
None at the moment. I have never heard of it before, but there is a lot I have never heard of so that does not mean much.
Perhaps we could say taxa is focused on beta taxonomy and taxlist on alpha taxonomy.
I might see what you mean. The taxmap and taxonomy classes taxa implements are focused on a whole "taxonomy" with or without explicit rank information. The hierarchy and taxon classes of taxa are more like the taxlist classes: They are independent taxa. The taxlist object seems to be able to store hierarchical data, but associated with one taxon (all the varieties of a fern species and its genus are associated with "fern"), which is a different way of thinking about it than I am used to. Its like the unit is "organism" rather than "taxon", so an organism (e.g. fern) can have taxonomic attributes, that may be hierarchical. Whereas in taxa, specific taxa might have associations with organism data (e.g. The genus Asplenium could be associated with the common name "fern" and so is the species obliquum by implication since it is a subtaxon ofAsplenium ).
Is there any potential for connecting the two packages (e.g., using taxlist inside taxa or adding functions to convert between the two systems)?
Hmm, not sure. From reading the vignette, the class concepts don't really mesh well (organism with taxonomic data vs taxa with data). The low level classes in taxa are more low level than the taxlist class and taxa is more object-oriented than taxlist from what I can tell, so there is not much of a reason to use taxlist in taxa or visa versa (considering taxlist is S4).
I could write a converter between the two if there is demand for it.
What do you think @kamapu?
well, it is not long time ago I realized the existence of taxa and I was a bit disappointed, since I was planing to submit taxlist to rOpenSci. My impression is that both, taxlist and taxa were aiming to the same task but using different approaches. This makes taxlist not eligible for rOpenSci, as I understand it.
To the general discussion I add some bullets in defense of taxlist:
- It is not true that
taxlistis handling organisms with taxonomic attributes, though it is providing a bridge between diversity records (which may or may not be based on organisms) and taxonomic entities, assuming those records linked to names (slottaxonNames). Thus you can include a name "fern 1", which may be a collection working name, and linked to a taxon with no-rank or indicated as "unresolved". - In the same sense,
taxlistobjects can be included as slot in further S4 objects handling diversity records, as used invegtable. - I though S4 is a more object-oriented programming than S3. So, why I read the opposite opinion here? In any case, S4 is providing the capacity to define some consistency rules or constraints for the content of the list, which are checked by the function
validObject. - In the definition of methods (functions), we were focusing on handling processes that are common in the work with vegetation-plot databases including taxonomic information, mainly for building up, complementing and tuning the taxonomy used as reference. It is also important to document the source for circumscription of taxa (taxon views in slot
taxonViews).
@arendsee I'm sorry, I'm not really answering your questions but I did not had the opportunity to compare in detail both packages.
@zachary-foster Thank you for including me in the discussion.
@sckott I won't be mad getting a function that converts taxa objects into taxlist and vice-versa, though I'm not completely recovered from the chock.
well, it is not long time ago I realized the existence of taxa and I was a bit disappointed, since I was planing to submit taxlist to rOpenSci. My impression is that both, taxlist and taxa were aiming to the same task but using different approaches. This makes taxlist not eligible for rOpenSci, as I understand it.
I am sorry to hear that. That kind of thing is always frustrating. Its a shame we did not talk before our packages were pretty much mature, otherwise we might have merged our efforts and not done redundant work. Perhaps @sckott has more to add regarding rOpenSci policy
It is not true that taxlist is handling organisms with taxonomic attributes...
Sorry for misunderstanding. I looked into vegtable to get an idea of how taxlist is used and found the following taxlist in it:
> head(Kenya_veg@species@taxonNames)
TaxonUsageID LETTERCODE SHORTNAME TaxonName NATIVENAME AuthorName SYNONYM TaxonConceptID
4 3 ABUTMAU Abutilon mauritianum Abutilon mauritianum <NA> (Jacq.) Medik. FALSE 3
5 50361 ABUTMAU Pavonia patens Pavonia patens <NA> (Andrews) Chiov. TRUE 3
6 4 ACACDRE Acacia drepanolobium Acacia drepanolobium <NA> Harms ex Y. Sjöstedt FALSE 4
7 5 ACACELA Acacia elatior Acacia elatior <NA> Brenan FALSE 5
10 8 ACACMEL Acacia mellifera Acacia mellifera <NA> (Vahl) Benth. FALSE 8
11 9 ACACPOL Acacia polyacantha Acacia polyacantha <NA> Willd. FALSE 9
> head(Kenya_veg@species@taxonRelations)
TaxonConceptID AcceptedName Basionym Parent Level ViewID
4 3 3 NA NA NA 1
6 4 4 NA NA NA 1
7 5 5 NA NA NA 1
10 8 8 NA NA NA 1
11 9 9 NA NA NA 1
12 10 10 NA NA NA 1
> head(Kenya_veg@species@taxonViews)
ViewID Author Year Title Published
sp_list 1 Easplist NA NA NA
> head(Kenya_veg@species@taxonTraits)
TaxonConceptID GENUS FAMILY
3 3 Abutilon Malvaceae
4 4 Acacia Leguminosae
5 5 Acacia Leguminosae
8 8 Acacia Leguminosae
9 9 Acacia Leguminosae
10 10 Acacia Leguminosae
- Are the slots "taxonNames" "taxonRelations" "taxonViews" "taxonTraits" always present, if empty? Are others possible?
- What is the difference between "TaxonUsageID" and "TaxonConceptID"?
- It looks like "taxonRelations" can define a tree structure using the "TaxonConceptID" and "Parent" (which I assume stores TaxonConceptIDs)? If so, that is similar too how we do it with
ex_taxmap$edge_list. - Does "taxonTraits" always store rank info, or are the columns arbitrary?
I though S4 is a more object-oriented programming than S3. So, why I read the opposite opinion here?
We are actually using R6 and the S3 is just a thin surface layer to make things familiar to more people. For example, our filter_taxa function can be called like filter_taxa(obj, ...) or like obj$filter_taxa(...) (The R6 way). Perhaps I should have said "modular" rather than "object-oriented" since both are object-oriented. That might not be true either; just based on my understanding so far, which is limited.
It is also important to document the source for circumscription of taxa (taxon views in slot taxonViews).
Interesting. So this slot documents who said that a grouping of taxa belong together? Are contradicting views possible. i.e. can the same dataset be classified by multiple trees in one object? Is this different than assigning an "authority" on a coarse taxonomic rank like family?
It looks like the taxlist class is most similar to to our taxmap class (assuming Kenya_veg@species is a good example of taxlist) except that:
- The information like that in the
taxonNamesslot is stored in a list oftaxonobjects (e.g.ex_taxmap$taxa). - The hierarchical structure defined in
taxonRelationsis stored in a table callededge_list(e.g.ex_taxmap$edge_list) - The
taxonViewsconcept is not implemented, unless you count authorities in thetaxonobjects in e.g.ex_taxmap$taxa - Data in
taxonTraitswould be stored in user-defined tables/list/vectors in thedatalist (e.g.ex_taxmap$data).
The data set Kenya_veg is a bit outdated. To be honest, a bad example for a consolidated database, on the other side the common case of databases imported from Turboveg.
It will better to look at Easplist in taxlist.
Are the slots "taxonNames" "taxonRelations" "taxonViews" "taxonTraits" always present, if empty?
Yes by definition of the class. Though "taxonViews" and "taxonTraits" may be empty (data frames with no rows). Check the prototype using new("taxlist")
Are others possible?
There is the Inheritance, meaning that you can define a new class inheriting taxlist properties but adding new slots. I have not yet tested such option.
What is the difference between "TaxonUsageID" and "TaxonConceptID"?
The first is the ID of the taxon usage names and the second is the ID of the taxon. So, the accepted name and respective synonyms for a taxon will have own "TaxonUsageID"s but share the same "TaxonConceptID".
It looks like "taxonRelations" can define a tree structure using the "TaxonConceptID" and "Parent" (which I assume stores TaxonConceptIDs)?
Yes, this column it is pointing to TaxonConceptID. BUT there is also the column "Level" which may be a factor variable (classes ordered bottom-up). The levels are custom-defined.
Does "taxonTraits" always store rank info, or are the columns arbitrary?
If the taxonomic information is already contained in "taxonRelations", there is no necessity to include it in "taxonTraits". BUT if you like to produce some statistics regarding taxonomy, especially once working in vegtable (e.g. number of species for different families within a plot observation), you may need to transfer this information to traits by using the function tax2traits (see the help for this function).
Are contradicting views possible.?
Not yet but it is in the TODO list.
Just a last comment regarding taxon views. We took this idea from the work of Jansen and Dengler(2010) and cited publications. There is an example about why the taxon view for a combination matters.
One manuscript about taxlist is under review. I can share it with you once we get some news from the journal.
Thanks for all the clarifications! That helps me understand taxlist much better.
One manuscript about taxlist is under review. I can share it with you once we get some news from the journal.
Cool! I would like to see it. We are actually submitting a paper on taxa in a few days to F1000, so it should be available there soon.
Aside: Whenever I tell people I am working on a standard for taxonomic data in R, I think of this:

Love that xkcd
wrt taxlist submission: We do have some pkgs in ropensci that are somewhat overlapping, but usually not this close as taxa and taxlist are. Our editorial board would have to discuss
I'll out of office for around 2 months but after that, we should think about compatibility between taxa and taxlist, especially regarding functions to convert data from one to another object class.
Sounds good @kamapu! From what I have seen, it should not be too difficult.
Well, I didn't really worked on export functions, up to now, but I am still considering it: perhaps I should play a bit with taxa.
In the meantime the article is on-line and some new features are available at kamapu/taxlist.
May somebody of you be willing to contribute?
Hello @kamapu, sorry for the delay.
I have not worked on it either, but its on my mental todo list. Which package do you think these conversion functions should be in, taxa or taxlist, or should one conversion be in each one (e.g. taxa has as.taxmap and taxlist has as.taxlist or visa versa)?
Either way is fine with me and I can help writing the conversion functions.
Thanks for the link to the article. I will read it. Our article is also no online in case you are interested: https://f1000research.com/articles/7-272/v1
Hello @zachary-foster: Great to see your message and the new publication. Regarding the function, I will prefer the last option: an import function in each package. Since I only have experience working alone in GitHub, it will be interesting to see how collaborative projects run.
Dear @arendsee
We are discussing in #233 (submission to ROpenSci) about writing some information for users about differences between taxa and taxlist. Thus I like to kindly ask you, which was your own decision on this respect (to use one or another package) and why.