metacoder
metacoder copied to clipboard
Running in paralle or multithreading option
Hey there..
Thanks the nice tool. Is there a way to run the final matrix in a parallel or multi-threaded manner. I have the following with a lot of rows (~ 1,248,624):
> print(obj$data$diff_table)
# A tibble: 1,248,624 x 7
taxon_id treatment_1 treatment_2 log2_median_ratio median_diff mean_diff wilcox_p_value
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 aab gp_1_Early gp_2_Early 0 0 0.0278 0.803
2 aac gp_1_Early gp_2_Early 0 0 0 NaN
3 aad gp_1_Early gp_2_Early 0.517 0.277 0.206 0.183
4 aae gp_1_Early gp_2_Early 1.12 0.0107 -0.0715 0.832
5 aaf gp_1_Early gp_2_Early 0 0 -0.000119 0.516
6 aag gp_1_Early gp_2_Early 0 0 -0.00519 0.191
7 aah gp_1_Early gp_2_Early 0 0 -0.00531 0.167
8 aai gp_1_Early gp_2_Early 0 0 -0.0452 0.146
9 aaj gp_1_Early gp_2_Early 0.721 0.00421 -0.0647 0.964
10 aak gp_1_Early gp_2_Early -Inf -0.0300 0.0313 0.256
# … with 1,248,614 more rows
And i'm trying to plot the final figure using the code below:
heat_tree_matrix(obj,
data = "diff_table",
node_size = n_obs, # n_obs is a function that calculates, in this case, the number of OTUs per taxon
node_label = taxon_names,
node_color = log2_median_ratio, # A column from `obj$data$diff_table`
node_color_range = diverging_palette(), # The built-in palette for diverging data
node_color_trans = "linear", # The default is scaled by circle area
node_color_interval = c(-3, 3), # The range of `log2_median_ratio` to display
edge_color_interval = c(-3, 3), # The range of `log2_median_ratio` to display
node_size_axis_label = "Number of ASVs",
node_color_axis_label = "Log2 ratio median proportions",
layout = "davidson-harel", # The primary layout algorithm
initial_layout = "reingold-tilford", # The layout algorithm that initializes node locations
output_file = "differential_heat_tree.pdf", # Saves the plot as a pdf file
key_size = 0.4, # adjust the size of the "key" tree with respect to the plot
row_label_size = 16, col_label_size = 16,
node_label_size_range = c(0.01,0.05) # node font size, see here:https://github.com/grunwaldlab/metacoder/issues/245
)
It's been running for 69h 51m so far and still no end in sight. So having a multi-thread option may help with the time.
Thanks a lot, Susheel
Hello, sorry for the delay. There is no such option, but it would be a useful addition. I will look into it when I get a chance. 70h sounds really long, even for a large dataset. I would recommend the following ways to optimize it:
- Filter out taxa you don't need to plot. You can do this before calculating differences or after. If you have a lot of comparisons, too many taxa will be confusing anyway. Remove uncommon taxa, or remove ranks in the taxonomy. Start small and add taxa until takes too long.
- Split it up into multiple plots if possible.
- Check that you are not running out of RAM. If you are running out of RAM and it using swap, it will be really slow.
Hey @zachary-foster Thanks a lot for the pointers. Yeah, it did run for longer than that, and eventually finished though - so all in all, a good exercise. I did trim a lot of them out, but it was one of those datasets with a lot of samples and groups to begin with.
Will certainly try the multiple plots option. It was mostly running on RAM so not an issue there. Given the nearly 1 million lines to parse, I figured it must have been the case.
While I'm here though, it would also be beneficial if there was a way for one to adjust the legend
outside of the plotting function itself? Especially w.r.t. the font sizes
etc. 'Cos If i'd want to adjust the legend size, I'd have to run the full plot and wait till I see if it worked, right? Or is there an alternative?
Thanks again!