qiita icon indicating copy to clipboard operation
qiita copied to clipboard

add lsu to list of data_type

Open antgonza opened this issue 8 years ago • 5 comments

A user requested to add LSU as a possible data type.

We will need to:

  • [ ] Add entry to database: qiita.data_type
  • [ ] Add to target gene processing (not sure about this one?)
  • [ ] If the previous one is needed, also change documentation and code (TARGET_GENE_DATA_TYPES in qiita_db.metadata_template.constants)

antgonza avatar Oct 27 '15 13:10 antgonza

LSU needs to be added to target gene. This is the reason why I was proposing to leve it as a target gene in general and provide a different way of subsetting it. The main issue is that there are a lot of different primers targeting different genes and in the current behavior we need to list all of them, whether the only thing that we care about is that it is target gene.

josenavas avatar Oct 27 '15 14:10 josenavas

Yes, I see your point. Other simpler options are have it in the config file but the downside is that we will need to restart the system to reactivate a new option; or as a new column in the data_type, for example: data_type_type.

antgonza avatar Oct 27 '15 16:10 antgonza

I think we should step back and ask ourselves the following question: Do we care which target gene was used?

In my opinion, no we don't. What we actually care for analysis is the parameters that where use for processing. For example, if you do a closed reference using GG but one of your runs was v4 and the other was v3-5 you can still compare them, since they've been generated closed reference. Then in your analysis you'll need to address if the different primers is introducing a bias or not.

josenavas avatar Oct 27 '15 16:10 josenavas

For the way it currently stands, no we don't, and I think is fine. However, it's a nice separation to have as users will like to know which one is 16S, 18S, etc. Moreover, this is information we need to send to EBI. Thus, we need to keep track of it but not sure if having the separation is needed.

antgonza avatar Oct 27 '15 16:10 antgonza

I think it is ok to provide a way for the user to differentiate the different datatypes. As far as I know, the information in the data type column from the database is not submitted to EBI, as that information is in theory represented with the primer used. It may be worth double checking.

That being said, I think the current data type should be changed to be only target gene, and then use something else (open to ideas) to keep track of the specific subtype. One idea might be to create another column "sub-type" to keep track of these, but I'm unsure if we will need more recursive subtypes for this and then everything gets messy.

Another idea is to represent this data as an ltree in the database, and just look at the root of the ltree to knew which pipeline to use. This will still require us to add a new row for each new subtype that we might need, but is just data added to a table and no changes in code are required. I think I will vote for this option, but I'm open to more ideas.

josenavas avatar Oct 28 '15 02:10 josenavas