TeamTeri
TeamTeri copied to clipboard
Bioinformatics on GCP, AWS or Azure
Team Teri
How this Repo is Organized
This Repo contains my own 'study notes' as I learn genomic-scale cloud bioinformatics. It includes descriptions of common tools, platforms and summaries of my work with clients. I update this Repo frequently. It is organized via the folder structure shown below.
- 🗒️ Concepts and Terms (genomics files types, use cases, terminology and also whitepapers)
- 🔬 Lab Testing (Illumina and more)
- ⚒️ Genomic Tools (GATK, VariantSpark, HAIL and many more - this section updates OFTEN)
- 📦 Genomics Platforms (Terra.bio, Galaxy Project, IDSeq and others)
- ☁️ Public Cloud Genomics (Alibaba Cloud, AWS, Azure or GCP). The general approach is to implement a cloud-native Data Lake pattern for scalable genomic analysis. A conceptual rendering of this pattern is shown below.
- 📚 LLMs for Bioinformatics (Reading List). So many papers and tools are being published in this area. Here's what I am reading now.
More Cloud/Genomics Reources
In addition to this Repo, I have a number of other Repos with cloud bioinformatics information. Also, I've included two of my favorite link aggregator resources here for additional learning.
My GitHub Open Source Courses
- :octocat: GENERAL CLOUD - my
learn-cloud
Repo - https://github.com/lynnlangit/learning-cloud - :octocat: GCP - my
gcp-for-bioinformatics
open source course - https://github.com/lynnlangit/gcp-for-bioinformatics - :octocat: AWS - my
aws-for-bioinformatics
open source course - https://github.com/lynnlangit/aws-for-bioinformatics - :octocat: WDL language - my
learn-wdl
open source course - https://github.com/openwdl/learn-wdl
Repos with Links
- :octocat: a link Collection : link to Repo (awesome bioinformatics) with large number of curated links for learning about bioinformatics tools and topics
- :octocat: bioinformatics benchmark papers - link links to published benchmark papers for bioinformatics
Data Lake Pattern
The Data Lake (or Data Mesh [Lake of Lakes]) pattern is key for implementing bioinformatics workloads effectively on any public cloud. Shown below is a simple conceptual explanation of this key concept.
![](https://github.com/lynnlangit/gcp-for-bioinformatics/blob/master/images/data-lake.png)
Who is Teri?
Teri is the impetus for my movement into the world of genomic research. She was diagnosed with breast cancer in 2016. She survived, but suffered a long course of intense and painful treatment due in part to the lack of availability of personalized treatment options at the time of her diagnosis.