dronefly icon indicating copy to clipboard operation
dronefly copied to clipboard

obs: Enable the ability to exclude taxa from search results

Open michaelpirrello opened this issue 4 years ago • 3 comments

Per discussion: e.g. lepidoptera without butterflies; fungi without lichens

michaelpirrello avatar Jun 23 '20 03:06 michaelpirrello

This is something that will be more important as we build out support for matching multiple observations instead of returning a single result.

synrg avatar Jul 01 '20 08:07 synrg

This is partially supported via opt without_taxon_id=# (and can take a comma-delimited list of id#s). I say partially because it is unwieldly for users to lookup and type the id numbers. For the commonly used ones, therefore, Dronefly has some predefined taxonomic group macros (see ,groups for a list).

synrg avatar Feb 02 '22 09:02 synrg

Adding this to the query language as not rather than without clears up 2 separate issues:

  1. We already use with for annotations. Using without for taxa would just cause confusion.
  2. Negation is not quite the same thing as exclusion. Group macros may contain exclusions, and the meaning of excluding an exclusion is not at all easy!

Example:

  • waspsonly expands to "Apocrita without Formicidae, Anthophila"

Exclusion:

  • insects without waspsonly would naively expand with exclusion to "Insecta without (Apocrita without Formicidae, Anthophila)"
  • how do we understand this? just flipping the inclusions to exclusions and vice versa is not the way:
    • Insecta, Formicidae, Anthophila
    • without Apocrita
      • but that would exclude everything that's at this rank, which is not what we want!

Negation:

  • we would need to map the exclusions into inclusions first, then negate those inclusions to exclude those, i.e. insects not waspsonly which I attempt to make an algorithm for below:
    • excluding excluding Formicidae (monotypic in Formicoidea) is:
      • include Formicoidea
      • exclude Apoidea, Chrysidoidea, Pompiloidea, Scolioidea, Thynnoidea, Typhioidea, Vespoidea
    • excluding excluding Apoidea is:
      • include Apoidea
      • exclude Formicoidea, Chrysidoidea, Pompiloidea, Solioidea, Thynnoidea, Typhiooidea, Vespoidea
    • the union of the exclusions is:
      • exclude Apoidea, Formicoidea, Chrysidoidea, Pompiloidea, Solioidea, Thynnoidea, Typhiooidea, Vespoidea
    • the final list exclusion list needs to re-include our "excluded excluded" inclusions (Formicoidea, Apoidea) by subtracting them:
      • exclude Chrysidoidea, Pompiloidea, Solioidea, Thynnoidea, Typhiooidea, Vespoidea
    • then finally combine them:
      • "Insecta without Chrysidoidea, Pompiloidea, Solioidea, Thynnoidea, Typhiooidea, Vespoidea", i.e. including the only two remaining members of Apocrita: Formicoidea, Apoidea

This is complicated enough, and also confusing for the user, so that I believe we should just forbid negation with the not keyword of groups that contain negations. If we ever need such a thing, we'd be better off making a new macro for it, like nonwasps.

So my proposal is to support not keyword in the query language, and it can only precede a selector for a single taxon (i.e. not a "group macro").

synrg avatar Feb 14 '23 14:02 synrg