GSoC icon indicating copy to clipboard operation
GSoC copied to clipboard

cBioPortal Data Collection Automation

Open inodb opened this issue 7 years ago • 7 comments

Background:

The cBioPortal is an open-access, open-source resource for interactive exploration of multidimensional cancer genomics data sets, which are collected from a multitude of sources such as published research papers, publicly available data repositories, and private data sets. Please refer to the cBioPortal home page for an overview.

Whenever data submissions come from external sources, a lot of manual curation needs to be performed to make sure the data is imported smoothly and rendered correctly in the cBioPortal. We would like to automate parts of this data curation process which will be in part handled through our datahub, a data repository that stores all cancer study data that is currently available in the cBioPortal.

Currently, whenever a Pull Request is made to datahub, the data undergoes a series of validation steps run by our data validation tool. However, to ensure that the data looks and renders as expected in the cBioPortal, one must manually import the data into a live instance of the portal. Automating this step in particular will be hugely beneficial to the QC process and greatly improve the turnaround time from data submission to import and visualization in the cBioPortal.


Goal:

Streamline and improve the turnaround time and review process for cancer study data submissions by automating the import of validated data files into a live instance of the cBioPortal.

Approach:

One option for spinning up review apps includes Heroku, which we use for reviewing changes to the backend of cBioPortal.

Another option might be Github Action for AWS Lightsail.

Both platforms support docker compose, for which configuration files already exist.


Needed skills:

  • General problem solving skills.
  • Some basic knowledge of *nix, bash and devops would be useful, but can be learned during the project.

Possible mentors: @inodb

inodb avatar Jan 25 '18 18:01 inodb

Hello!. It's Chetan. The idea is quite interesting. would like to work on it. To start with what task should I perform?

css911 avatar Feb 28 '19 09:02 css911

@ao508 I noticed this was transferred from GSoC. If we are not working on it, maybe we can transfer it back?

inodb avatar Aug 10 '20 16:08 inodb

@inodb that's okay with me

ao508 avatar Aug 10 '20 19:08 ao508

Very interesting idea. I would like to have a go at it, where is the open source code to start from?

daniocionini avatar Apr 05 '22 20:04 daniocionini

the source code is available in github. https://github.com/cBioPortal

jagnathan avatar Apr 13 '22 20:04 jagnathan

hey am interested in this project can u guide me further @inodb

devharsh2k4 avatar Feb 23 '23 14:02 devharsh2k4

Hi @inodb ! I'm Muskan Kothari, currently a CSE senior at PES University, India. I'm here to contribute to this project through GSoC '23. I studied biology prior to starting undergrad in CSE and I'm highly interested in applying CSE to interdisciplinary domains. Having said that, I do have multiple projects involving computer science fundamentals to biology (Measures of lexical diversity and Alzheimer's detection) and physics (Tree based models for critical temperature of super conductors).

I also have experience working in big data and devops technologies like Docker and Kubernetes (converting monolith application to micro-services), PySpark and Hadoop (sentiment analysis of twitter).

I am proficient in programming languages likePython, C++ and Java and comfortable using Git.

I found the cBioPortal organization a perfect mix of my interests in interdisciplinary projects and my skills in various technologies that particularly help this project - cBioPortal Data Collection Automation. I'd love to learn and contribute to this project.

I understand that working on some issues would strengthen my application and I will also be spending time understanding the organization. I'd like to get started with my proposal. I've joined the slack as well.

Could we perhaps set up a discussion call? Could you tell me what technologies would be involved under DevOps?

Thanks! Muskan

muskan-k avatar Feb 25 '23 08:02 muskan-k