qiita
qiita copied to clipboard
Prohibition for env_package should apply to samples in prep not total list of samples
Approval for studies is requested per prep and the prohibition applicable to env_package should be applied only to the set/subset of samples in the prep where approval is being requested not to the entire list of samples. Many samples in the full list are entered erroneously or not sequenced and are not visible in the public study.
The warning on the sample info page is sufficient I think for people planning to use sandbox studies for analysis.
The message on the warning should be reworded because it implies that all values are incorrect. Change to:
Sample Info has invalid values: ", Unspecified, LabControl test, Not applicable, None", valid values are: "air, built environment, host-associated, human-associated, human-skin, human-oral, human-gut, human-vaginal, microbial mat/biofilm, misc environment, plant-associated, sediment, soil, wastewater/sludge, water"
Currently reads: Sample Info has a no valid values: ", Unspecified, LabControl test, Not applicable, None", valid values are: "air, built environment, host-associated, human-associated, human-skin, human-oral, human-gut, human-vaginal, microbial mat/biofilm, misc environment, plant-associated, sediment, soil, wastewater/sludge, water"
I was taking a look of what will it take to solve this issue and it's not as trivial as I thought; the main issue is that we need to check which metadata columns (categories) are present in the info file and which are required, and the method we use to get those categories is by looking at the sample_values->>'columns'
row (as this is much faster that checking each sample). Thus, to make this change, we will need to create a new method that returns the existing categories by a group of samples (which might be too slow if the prep has too many samples AKA it needs to be benchmarked).