pyroed icon indicating copy to clipboard operation
pyroed copied to clipboard

Open datasets for evaluation

Open fritzo opened this issue 3 years ago • 3 comments

What are some open datasets for evaluation? These will be needed to answer #3 about hyperparameters and algorithms

cc @andrenguyen

fritzo avatar Mar 22 '22 17:03 fritzo

Moss et al. (2020) (section 5.2 and appendix E) evaluate their algorithm using minimum free folding energy as an objective function in optimizing short proteins, deferring to ViennaRNA to compute the objective function in experiments. Here is an example where they call the RNAfold utility as a subprocess.

We acknowledge that [minimizing minimum free-fold energy] may not be biologically meaningful on its own, however, as free-folding energy is of critical importance to other down-stream genetic prediction tasks, we believe it to be a reasonable proxy for wet-lab-based genetic design loops.

fritzo avatar Mar 27 '22 17:03 fritzo

Angermueller et al. (2020) (section 5) provide a number of in-silico benchmarking problems, including tfbind8 and tfbind10.

fritzo avatar Mar 30 '22 13:03 fritzo

I've worked with Tcellmatch (Fischer et al. 2020) before; it makes predictions based on short sequences (CDR3s), including variable length sequences. I believe @andrenguyen has some recent experience with this model also.

EWeinstein avatar Mar 30 '22 14:03 EWeinstein