qurro icon indicating copy to clipboard operation
qurro copied to clipboard

split up taxonomy information by semicolons, if a space doesn't occur after every semicolon?

Open fedarko opened this issue 4 years ago • 0 comments

since SILVA taxonomies don't have spaces... this makes it hard to view this info in tooltips ._.

  • [ ] Add a function to _metadata_utils (or df_utils, doesn't really matter) that takes in a DataFrame of feature metadata, and:
    • finds columns labeled "Taxon" or "Taxonomy" (if both present, then either pick one or don't do anything)
    • goes through some or all of the entries in this column
    • if there are enough semicolons that don't have whitespace following the semicolon, and there aren't any semicolons that do have whitespace following the semicolon, then actually modify the taxonomy strings to add whitespace following semicolons (not adding whitespace after semicolons that are the last character in the taxonomy string, I guess)
  • [ ] Integrate that function into the main Qurro workflow (i.e. after we load the feature metadata into a DF but before we merge it into the ranking info)
  • [ ] Test the "splitting" function developed above.
  • [ ] Update the CHANGELOG explaining the situation

From talking with Franck, this sounds like a kosher way to handle these kinds of taxonomy strings.

It might also be worth doing this for other fields besides Taxon/Taxonomy (e.g. Feature ID in the Byrd example dataset), but that'd probably require we add a parameter to let the user specify a "taxonomy field" up front when creating the visualization

fedarko avatar Oct 22 '19 19:10 fedarko