qiime icon indicating copy to clipboard operation
qiime copied to clipboard

qiime.filter.filter_mapping_file code and filter_samples_from_otu_table.py

Open wdwvt1 opened this issue 8 years ago • 1 comments

qiime.filter.filter_mapping_file is code that needs to be replaced.

  1. Documentation is actively misleading: documentation says that the function removes metadata columns from the mapping file data if every sample to be retained (good_sample_ids) has a unique value in that metadata column. The function does not check for this or remove this data.
  2. It produces incorrect results: the last column in the mapping file will be retained even if the set of values for that metadata column are only length 1 (i.e every sample has the same value in that column). This is what the script says it filters out.
  3. It is undertested: the option column_rename_ids is the majority of the function, but is never hit by the test code. In addition, this option is not used in the qiime code base in any location to the best of my knowledge. You can verify with grep -nr 'column_rename_ids' qiime_dir and by looking at all the files that are returned from grep -nr 'filter_mapping_file' qiime_dir.

This function causes weird output for scripts like filter_samples_from_otu_table.py with the --output_mapping_fp option as issue #2060 reported.

wdwvt1 avatar Jul 20 '15 02:07 wdwvt1

+1, this should ultimately be replaced with filtering of DataFrames in QIIME 2 (when metadata tables will always be represented internally as pandas DataFrames).

gregcaporaso avatar Jul 20 '15 19:07 gregcaporaso