gcv icon indicating copy to clipboard operation
gcv copied to clipboard

rethink data filtering

Open adf-ncgr opened this issue 10 months ago • 0 comments

currently, we have some brute force filtering implemented client side with regex matching applied to track names (micro and macro). While this has given us a lot of bang for the implementation buck when coupled with the naming conventions we use in LIS, it suffers from a few problems that seem worth addressing for GCV3:

  • as a client-side operation, it requires that the servers do a lot of extra computational work to deliver results that may only be a nuisance in the context of a specific use case (e.g. user is only interested in comparisons within glycine, but gets results for every species in LIS before applying a gly.* naming filter)
  • relying upon naming conventions isn't generally a good idea and sometimes fails to work (e.g. suppose the "glycine" user's gly.* filter based on our "gensp" naming also returned genomes from genus Glycyrrhiza)

since we may be upgrading the data model as part of GCV3, it might be a good time to also consider this (even though the arguments above would still have weight even if we left the data model as is).

adf-ncgr avatar Aug 21 '23 21:08 adf-ncgr