biom-format icon indicating copy to clipboard operation
biom-format copied to clipboard

metadata_to_dataframe order matters

Open guillaume-gricourt opened this issue 4 years ago • 3 comments

Hi, When you have this: good.txt it'ok When the order of metadata is different : bad.txt You have : ValueError: 2 columns passed, passed data had 6 columns Maybe, taking account the maximum of value before parsing them ? biom-format v2.1.10

guillaume-gricourt avatar Feb 10 '21 12:02 guillaume-gricourt

Hi @guillaume-gricourt, that parser was designed to support classic OTU tables from QIIME1 where the lineages were assured to be balanced with placeholders for unidentified names. TSVs are not BIOM-Format, and are unstructured which, which creates a wide range of edge cases.

As a work around, you could parse counts without metadata, parse the taxonomy separately and add it in with biom.Table.add_metadata?

wasade avatar Feb 10 '21 16:02 wasade

Yeah it's a good workaround. I create biom files from tsv to load data into Phyloseq package. Also, this file is my entrypoint to perform others analysis. From now on, when I'll create this biom file I'll check the order of metadata on my tsv file. As you can create this kind of biom file, it seems to me, it's a feature of interest to implement ?

guillaume-gricourt avatar Feb 10 '21 16:02 guillaume-gricourt

I'd greatly welcome a pull request to resolve this feature request, otherwise I'm not sure when I'll be able to get to it. A possible work around is below.

$ biom convert -i bad.txt -o bad.biom --to-hdf5
$ python
Python 3.6.11 | packaged by conda-forge | (default, Aug  5 2020, 20:19:23)
[GCC Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import biom
>>> df = pd.read_csv('bad.txt', sep='\t')
>>> df.set_index('#OTU ID', inplace=True)
>>> t = biom.load_table('bad.biom')
>>> formatted = {k: {'taxonomy': v.split(';')} for k, v in df['taxonomy'].items()}
>>> t.add_metadata(formatted, axis='observation')
>>> with biom.util.biom_open('okay.biom', 'w') as fp:
...   t.to_hdf5(fp, 'converted')
... 

wasade avatar Feb 10 '21 16:02 wasade