metacoder
metacoder copied to clipboard
Error with parse_tax_data and installation issues
Transferred from https://github.com/ropensci/taxa/issues/210 for @emankhalaf
I have a feature table with taxonomy collapsed to the genus level, where the first column is the taxonomy (ranks separated by ;), then the rest of columns represents samples_id showing the read count of each feature. I need to split the taxonomy column into 6 taxonomic ranks using parse_tax_data function. I used this code:
obj <- parse_tax_data(feature-table-with-taxonomyl6,
class_cols = "taxonomy",
class_sep = ";",
class_regex = "^([a-z]{0,1})_{0,2}(.*)$",
class_key = c("tax_rank" = "taxon_rank", "name" = "taxon_name"))
print(obj)
then I got this error:
Error in parse_tax_data(feature - table - with - taxonomyl6, class_cols = "taxonomy", : could not find function "parse_tax_data"
However, I already loaded taxa package but I have a problem when installed devtools.
Thanks! Eman
Can you give me part of the input data so I can see how it is formatted?
I need to split the taxonomy column into 6 taxonomic ranks
If you are just trying to split taxonomy column in to 6 per-rank columns and don't need to use other metacoder functions that require the taxmap objects produced by parse_tax_data, you can use:
library(tidyr)
separate(feature-table-with-taxonomyl6, taxonomy, c("Kingdom", "Class", "Order", "etc..."), sep = ';')
I did the following:
my_table <- read_csv("file.csv", col_names = TRUE) # readr function
GT <- separate(my_table, taxonomy, c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"), sep = ";")
head(GT)
I got this error:
Error:
! Must extract column with a single valid subscript.
x Subscript `var` has the wrong type `function`.
ℹ It must be numeric or character.
Backtrace:
1. tidyr::separate(...)
2. tidyr:::separate.data.frame(...)
3. tidyselect::vars_pull(names(data), !!enquo(col))
4. tidyselect:::pull_as_location2(loc, n, vars)
13. vctrs::vec_as_subscript2(i, arg = "var", logical = "error")
14. vctrs:::result_get(...)
Error:
x Subscript `var` has the wrong type `function`.
ℹ It must be numeric or character.
Any recommendations here! Much thanks!
What does the table look like?
It is feature table with taxonomy as txt file then I converted it into csv. So, the first row is the header including taxonomy, S1, S2,.... Then, the row names are the taxonomy (d_kingdom up to s_species), and the abundance/read count of each feature across samples. I can e.mail the file to you if you do not mind!
Thank you! Eman
You have a column at the end named taxonomy too. Since you have two columns with the same name readr::read_csv renames them, which is why your code did not work. Note that readr::read_csv tells you when it renames columns in the output below. Does this do what you wanted?
library(readr)
library(tidyr)
my_table <- read_csv("~/Downloads/feature-table-with-taxonomyl6.csv", col_names = TRUE) # readr function
#> New names:
#> * taxonomy -> taxonomy...1
#> * taxonomy -> taxonomy...58
#> Rows: 308 Columns: 58
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): taxonomy...1
#> dbl (56): 1P-GH-R1, 1P-GH-R2, P1, P10, P11, P12b, P13, P14b, P15, P16, P17, ...
#> lgl (1): taxonomy...58
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
GT <- separate(my_table, "taxonomy...1", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus"), sep = ";") # No species rank in data
GT # Dont need to use head for tibbles
#> # A tibble: 308 × 63
#> Kingdom Phylum Class Order Family Genus `1P-GH-R1` `1P-GH-R2` P1 P10
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 d__Bacteria p__Pr… c__G… o__E… f__Er… g__P… 0 0 8276 6048
#> 2 d__Bacteria __ __ __ __ __ 0 0 0 2
#> 3 d__Bacteria p__Ch… c__D… o__S… f__S0… g__S… 0 0 0 53
#> 4 d__Bacteria p__Ba… c__B… o__S… f__Sp… g__S… 0 0 0 0
#> 5 d__Bacteria p__Fi… c__B… o__B… f__Ba… g__B… 0 0 0 1283
#> 6 d__Bacteria p__Ba… c__B… o__C… __ __ 0 0 0 0
#> 7 d__Bacteria p__Fi… c__S… o__S… f__Sy… g__C… 0 0 0 0
#> 8 d__Bacteria p__Ba… c__B… o__C… f__Cy… g__S… 0 0 26 11
#> 9 d__Bacteria p__Pr… c__G… __ __ __ 0 0 0 0
#> 10 d__Bacteria p__Ba… c__B… o__F… f__We… g__C… 0 0 34 32
#> # … with 298 more rows, and 53 more variables: P11 <dbl>, P12b <dbl>,
#> # P13 <dbl>, P14b <dbl>, P15 <dbl>, P16 <dbl>, P17 <dbl>, P19 <dbl>,
#> # P2 <dbl>, P20 <dbl>, P21 <dbl>, P22 <dbl>, P23 <dbl>, P24 <dbl>, P25 <dbl>,
#> # P26 <dbl>, P27 <dbl>, P28 <dbl>, P29 <dbl>, P31 <dbl>, P32 <dbl>,
#> # P33 <dbl>, P34b <dbl>, P35 <dbl>, P36b <dbl>, P37 <dbl>, P38 <dbl>,
#> # P39b <dbl>, P40b <dbl>, P41 <dbl>, P42 <dbl>, P43 <dbl>, P44 <dbl>,
#> # P45 <dbl>, P46 <dbl>, P47 <dbl>, P48 <dbl>, P49 <dbl>, P4b <dbl>, …
Created on 2022-03-02 by the reprex package (v2.0.1)
@zachary-foster Thank you so much! Now it works. I exported the file as tsv and deleted the extra taxonomy column.
No problem! Glad its working