pyroed Open datasets for evaluation

Open datasets for evaluation

Open fritzo opened this issue 3 years ago • 3 comments

What are some open datasets for evaluation? These will be needed to answer #3 about hyperparameters and algorithms

cc @andrenguyen

Mar 22 '22 17:03 fritzo

Moss et al. (2020) (section 5.2 and appendix E) evaluate their algorithm using minimum free folding energy as an objective function in optimizing short proteins, deferring to ViennaRNA to compute the objective function in experiments. Here is an example where they call the RNAfold utility as a subprocess.

We acknowledge that [minimizing minimum free-fold energy] may not be biologically meaningful on its own, however, as free-folding energy is of critical importance to other down-stream genetic prediction tasks, we believe it to be a reasonable proxy for wet-lab-based genetic design loops.

Mar 27 '22 17:03 fritzo

Angermueller et al. (2020) (section 5) provide a number of in-silico benchmarking problems, including tfbind8 and tfbind10.

Mar 30 '22 13:03 fritzo

I've worked with Tcellmatch (Fischer et al. 2020) before; it makes predictions based on short sequences (CDR3s), including variable length sequences. I believe @andrenguyen has some recent experience with this model also.

Mar 30 '22 14:03 EWeinstein

pyroed pyroed copied to clipboard

Open datasets for evaluation

pyroed
pyroed copied to clipboard