confusing ID column names

Open redbluewater opened this issue 5 years ago • 1 comments

I am putting this comment here, but it also impacts the examples for mmvec (and maybe other programs).

The 'red sea' example for songbird uses sampleid as an identifier for the sequence data (feature_metadata.txt) and also uses sampleid as an identifier for the samples (in redsea_metadata.txt). As I am still learning qiime, I am not sure of the best way around this. However, having only two choices (some variant of sampleid and featureid) does not seem like enough choices. For example:

sequence data .... needs sequence_metadata
metabolites data .... needs metabolites_metadata
sample data ... needs sample_metadata

How about this idea:

sequence data .... sequence_id
metabolites data .... metabolite_id
sample data ... sample_id

This is more precise than 'sampleid' or 'featureid', especially as a mass spectrometry group who uses 'features' to define peaks in mass spectrometry data (the opposite of the use of features in the examples here).

Thanks as ever for developing these tools. They are extremely useful and I am excited to use them to dig into my own data.

May 06 '20 13:05 redbluewater

@KujawinskiLaboratory this is a very good point - one that stemmed from the traditional definitions of metadata.

There have been a couple of discussions about this in other contexts, in particular https://github.com/biocore/emperor/issues/726 https://github.com/qiime2/q2-emperor/issues/81

I'd think it'll take a fairly extensive refactor of qiime2 to make sure that these types propagate accordingly (i.e. what about all of the other omics datatypes, such as transcriptomics, proteomics). CC @ElDeveloper @ebolyen for further discussion

May 22 '20 22:05 mortonjt