FlashWeave.jl icon indicating copy to clipboard operation
FlashWeave.jl copied to clipboard

learn_network defaults

Open nick-youngblut opened this issue 4 years ago • 1 comments

The learn_network doc shows:

help?> learn_network
search: learn_network

  learn_network(data_path::AbstractString, meta_data_path::AbstractString) -> FWResult{<:Integer}

  Works like learn_network(data::AbstractArray{<:Real, 2}), but instead of a data
  matrix takes file paths to an OTU table and optionally a meta data table as an
  input.

    •  data_path - path to a file storing an OTU count matrix (and JLD2 meta
       data)

    •  meta_data_path - optional path to a file with meta data

    •  *_key - HDF5 keys to access data sets with OTU counts, Meta variables and
       variable names in a JLD2 file. If a data item is absent the corresponding
       key should be 'nothing'. See '?load_data' for additional information.

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  kwargs... - additional keyword arguments passed to
       learn_network(data::AbstractArray{<:Real, 2})

  ────────────────────────────────────────────────────────────────────────────────────

  learn_network(data::AbstractArray{<:Real, 2}) -> FWResult{<:Integer}

  Learn an interaction network from a data matrix (including OTUs and optionally meta
  variables).

    •  data - data matrix with information on OTU counts and (optionally) meta
       variables

    •  header - names of variable columns in data

    •  meta_mask - true/false mask indicating which variables are meta variables

  Algorithmic parameters

    •  heterogeneous - enable heterogeneous mode for multi-habitat or -protocol
       data with at least thousands of samples (FlashWeaveHE)

    •  sensitive - enable fine-grained association prediction (FlashWeave-S,
       FlashWeaveHE-S), sensitive=false results in the fast modes (FlashWeave-F,
       FlashWeaveHE-F)

    •  max_k - maximum size of conditioning sets, high values can lead to the
       removal of more spurious edgens, but may also strongly increase runtime
       and reduce statistical power. max_k=0 results in no conditioning
       (univariate mode)

    •  alpha - statistical significance threshold at which individual edges are
       accepted

    •  conv - convergence threshold, e.g. if conv=0.01 assume convergence if the
       number of edges increased by only 1% after 100% more runtime (checked in
       intervals)

    •  feed_forward - enable feed-forward heuristic

    •  fast_elim - enable fast-elimiation heuristic

    •  max_tests - maximum number of conditional tests that is performed on a
       variable pair before association is assumed

    •  hps - reliability criterion for statistical tests when sensitive=false

    •  FDR - perform False Discovery Rate correction (Benjamini-Hochberg method)
       on pairwise associations

    •  n_obs_min - don't compute associations between variables having less
       reliable samples (non-zero samples if heterogeneous=true) than this
       number. -1: automatically choose a threshold.

    •  time_limit - if feed-forward heuristic is active, determines the interval
       (seconds) at which neighborhood information is updated

  General parameters

    •  normalize - automatically choose and perform data normalization method
       (based on sensitive and heterogeneous)

    •  track_rejections - store for each discarded edge, which variable set lead
       to its exclusion (can be memory intense for large networks)

    •  verbose - print progress information

    •  transposed - if true, rows of data are variables and columns are samples

    •  prec - precision in bits to use for calculations (16, 32, 64 or 128)

    •  make_sparse - use a sparse data representation (should be left at true in
       almost all cases)

    •  make_onehot - create one-hot encodings for meta data variables with more
       than two categories (should be left at true in almost all cases)

    •  update_interval - if verbose=true, determines the interval (seconds) at
       which network stat updates are printed

What are the defaults for these parameters (eg., prec)?

nick-youngblut avatar May 15 '21 11:05 nick-youngblut

Hi Nick! Good point, I will look into adding these to the docs. Currently one would have to look directly at the method definitions in learning.jl (e.g. prec defaults to 32).

jtackm avatar May 19 '21 12:05 jtackm