cbioportal Upgrade to new CIViC API

Upgrade to new CIViC API

Open inodb opened this issue 2 years ago • 1 comments

We can do this work as part ICTR set aside funding. Integrate properly in Genome Nexus

TODO: need to make separate epic detailing the set aside work

May 10 '22 15:05 inodb

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Aug 14 '22 03:08 stale[bot]

Hey guys. Just pinging this issue. Would love to see cBioPortal pulling in the most current CIViC data which is only available through V2 API. Let us know if we can help!

Jun 22 '23 19:06 obigriffith

Hi, we wanted to ping this issue again to make you aware that the CIViC V1 API will officially be retired on November 1st, 2023. Please let me or @acoffman know if you need any assistance in porting cBioPortal to the V2 API.

Oct 12 '23 20:10 susannasiebert

@susannasiebert @acoffman We use two endpoints in V1 API, could you tell me which new query should we use to get the corresponding data?

https://civicdb.org/api/genes/ + entrez_gene_ids to get id, name, variants
https://civicdb.org/api/variants/' + id to get description and evidence_items Thank you very much!

Nov 07 '23 20:11 leexgh

Hi!

So, the new api is GraphQL based. If you are unfamiliar, you can kind of think of it as similar to SQL. Rather than a series of URL based endpoints (/genes, /variants, etc) there is a singular endpoint: https://civicdb.org/api/graphql. You construct a query, asking for the data you want, and POST it to that endpoint.

As such, there are not direct 1:1 mappings from the old endpoints to the new ones, however you can achieve similar results. We have a sandbox where you can try queries and browse all the available fields: https://civicdb.org/api/graphiql

As part of CIViC V2 we have updated our data model to support attaching Evidence Items to logical combinations of variants (an example could be BRAF Amplification AND ( BRAF V600E OR BRAF V600K ). We call these Molecular Profiles.

All Evidence is attached to a Molecular Profile now rather than directly to a Variant. However, every Variant has a Molecular Profile consisting of only itself which you can treat in essentially the same way you were treating Variants before.

As an example of fetching Evidence Items for a given Variant ID, you could do something like this:

{
  variant(id: 6) {
    id
    name
    alleleRegistryId
    singleVariantMolecularProfile {
      description
      evidenceItems {
        totalCount
        pageInfo {
          endCursor
          hasNextPage
        }
        nodes {
          id
          link
          evidenceType
          evidenceRating
          evidenceDirection
          evidenceLevel
          description
          variantOrigin
          status
          therapies {
            id
            ncitId
            name
          }
          source {
            id
            link
          	name
          }
          disease {
            id
            doid
            displayName
          }
        }
      }
    }
  }
}

If you paste that into the sandbox linked above you can see how the response corresponds to the fields you requested, and by browsing the docs linked in the upper right you can see all the fields available.

Currently, there is a way to page through all the genes, or retrieve a gene based on Entrez ID but, I don't believe we have a filter that will take multiple entrez ids at once. If that's something you need for your integration we can add one quickly, just let us know!

We have examples of using the API in both Python and R here: https://github.com/griffithlab/civic-v2/tree/main/examples. The python example includes pagination as well. Also happy to help troubleshoot or answer any additional questions.

Thanks! Adam

Nov 08 '23 20:11 acoffman

@acoffman Thank you so much!

Nov 10 '23 15:11 leexgh

No problem! If you find that you need any additional fields or any other ways to filter the queries, just let us know!

Nov 10 '23 15:11 acoffman

@acoffman Is there a way to send a list of hugo symbols in one query, and get gene and variants information back in response? This is the query I use, but the problem is it only sends one gene at a time so we send too many requests to the server:

query gene(
  $entrezSymbol: String,
) {
  gene(
    entrezSymbol: $entrezSymbol,
  ) {
    id
    entrezId
    description
    link
    name
    variants {
      nodes {
        name
        id
        link
        singleVariantMolecularProfile{
          description
          evidenceItems {
            nodes {
              id
              name
              description
              evidenceType
              evidenceDirection
              evidenceLevel
              significance
              disease {
                displayName
                name
                id
                link
              }
              therapies{
                name
                id
                ncitId
                therapyAliases
              }
            }
          }          
        }
      }
    }
  }
}

Nov 10 '23 22:11 leexgh

Hi @leexgh

We have just pushed a release that includes top level entrezSymbols and entrezIds filters for the genes query.

You can now do something like:

genes(entrezSymbols: ['BRAF', 'EGFR'])

to retrieve multiple genes at once. You can request all the same fields as before on the returned Genes.

Keep in mind that if you request more Genes than the default page size (I believe its 25 in a single request), they could spill over onto multiple pages. You can check that in the pageInfo block:

pageInfo {
   hasNextPage
   endCursor
}

If this is the case, you can set up a pretty straightforward while loop that does something along the lines of while hasNextPage == true send the same request as before but passing the value of endCursor to the after filter.

genes(entrezSymbols: ['BRAF', 'EGFR'], after: "endCursorValue")

Let us know if this helps or if there's anything additional we can do to make your integration easier!

Thanks, Adam

Nov 13 '23 21:11 acoffman

@acoffman Hi Adam, thank you for the updates! It's very helpful! I have a follow-up question about graphql query pagination. Here is my query structure:

query genes($after: String, $entrezSymbols: [String!]) {
  genes(after: $after, entrezSymbols: $entrezSymbols) {
      pageInfo {
          endCursor
          hasNextPage
          startCursor
          hasPreviousPage
      }
      nodes {
          # some fields
          variants {
            pageInfo {
              endCursor
              hasNextPage
              startCursor
              hasPreviousPage
            }
              nodes {
                  # some fields
                  singleVariantMolecularProfile {
                      # some fields
                      evidenceItems {
                        pageInfo {
                          endCursor
                          hasNextPage
                          startCursor
                          hasPreviousPage
                        }
                          nodes {
                              # some fields
                              disease {
                              }
                              therapies {
                              }
                          }
                      }
                  }
              }
          }
      }
  }
}

As you can see we will give a list of gene symbols, the information we need is:

annotation for each gene
for each single gene, we need all variants of this gene
for each single variant, we need all evidence of this variant

So there will be three levels of paginations in the response (for gene, variant, and evidence respectively). As far as I can see, the page size is set as 50 (or 25? I tested on the civicdb playground and got 50 back) and cannot be overwritten, this means we may need to send up to n^3 (1 gene needs n variants + n*n evidence theoretically in the worst case) follow up queries to get all the data we need.

Do you have any suggestions on handling the fetching of nested queries? We'd like to reduce the number of requests sent to civicdb to make sure both of us can have the best performance.

Thank you very much!

Xiang

Nov 20 '23 00:11 leexgh

@leexgh Do you have a specific example of all the data you would like to display and how you're currently displaying it? On the CIViC website we try to avoid multiple nested queries like this and instead pull back the evidence level data in a separate request that only gets executed when the users wants it, e.g. by utilizing popovers. On pages like the browse tables, where this is unavoidable, we actually use a materialized view to avoid having to execute computationally-expensive, complex queries with multiple nested levels of joining on the fly. You might want to look into the browseGenes query instead of the genes query. This one takes a single entrezSymbol but aggregates the variants as well as the disease and therapy terms for the underlying evidence. So the number of requests here would depend on the number of genes. This query does include complex molecular profiles so that might not be desired on your end.

Nov 20 '23 15:11 susannasiebert

@susannasiebert We display civicdb data in cbioportal mutations table and copy number alteration table. For example when you hover over the civic icon, there is tooltip popup: We need to show the gene (PIK3CA) and description (text in first paragragh), variant(E545K) of this gene and description (text in purple box), and count evidence of the variant by type (predictive:30, prognostic: 1).

We usually have hundreds of genes in copy number alteration table, switching to queries that only accept single gene would need hundreds of queries which is not ideal for performance.

Do you think it's possible to have customized page size? I try the first parameter in genes query, but it only returns up to 50 records based on my test. It would be helpful if it can accept a larger number.

Any suggestion is appreciated!

Nov 20 '23 22:11 leexgh

@leexgh We can definitely increase the allowable page size up from 50.

Unfortunately, we can't let it be entirely unbounded; because GraphQL lets you define arbitrary queries, if we had no limits on page size, people could write queries that potentially pulled back the entire database at once. While that would be nice, it wouldn't be performant for our servers or users. We will do a little testing on our end and figure out how high we can increase the limit and still maintain acceptable performance. Hopefully we can make it less likely that you'll need to break it up into multiple queries, but you still may need to be aware of that possibility. The hasNextPage boolean will let you know.

If you need to display the counts of various evidence types in the popover, we can make that directly queryable in the API for you so that you don't have to pull the evidence back and aggregate it yourself.

We probably won't have it done before the Thanksgiving break, but we should be able to get these changes out next week and I will follow up here when we do!

Nov 22 '23 20:11 acoffman

@acoffman Thank you so much!

Nov 25 '23 00:11 leexgh

Hi @acoffman! Do you have any updates about the API? I appreciate any information you can provide.

Dec 07 '23 03:12 leexgh

I'm terribly sorry but our release of the new "evidence counts by type" feature has been delayed on our end. It won't be out until next week. In the meantime you can test out this feature on our staging website (staging.civicdb.org). With this update you should be able to do the following:

query genes($after: String, $entrezSymbols: [String!]) {
  genes(after: $after, entrezSymbols: $entrezSymbols) {
      pageInfo {
          endCursor
          hasNextPage
          startCursor
          hasPreviousPage
      }
      nodes {
          # some fields
          variants {
            pageInfo {
              endCursor
              hasNextPage
              startCursor
              hasPreviousPage
            }
              nodes {
                  # some fields
                  singleVariantMolecularProfile {
                      # some fields
                      evidenceCountsByType {
                          diagnosticCount
                          predictiveCount
                          prognosticCount
                          predisposingCount
                          oncogenicCount
                          functionalCount
                      }
                  }
              }
          }
      }
  }
}

Dec 07 '23 15:12 susannasiebert

@susannasiebert No problem at all. I appreciate the update. Looking forward to seeing the new feature next week! This will be very helpful for us, thank you very much!

Dec 07 '23 22:12 leexgh

@susannasiebert Happy new year! Hope you had a great holiday time! Just want to check if there is a plan for the new release?

Jan 04 '24 05:01 leexgh

Hi @leexgh,

Thanks for following up with us! A new release is out as of Friday that contains the new query documented above. We will push out an additional release this week that also increases the maximum allowable page size. I'll follow up here when that is deployed as well!

Jan 08 '24 15:01 acoffman

@acoffman Thank you so much! Looking forward to the new release!

Jan 08 '24 17:01 leexgh

Hi @leexgh

This release it out! We have doubled the maximum page size to 100 entries and the fields that Susanna demonstrated here are available for querying.

Thanks! Adam

Jan 10 '24 19:01 acoffman

Hi @acoffman, thank you very much!

Jan 11 '24 20:01 leexgh

@acoffman I found an issue on Genes query: https://github.com/griffithlab/civic-v2/issues/980. Please let me know if you want me to add more explanation.

Jan 15 '24 21:01 leexgh

Thank you so much @leexgh for the detailed bug report; that made the issue easy to track down.

I have a hotfix going out this afternoon which will resolve it!

Jan 16 '24 21:01 acoffman

@acoffman Thanks for the quick fix!

Jan 17 '24 20:01 leexgh

cbioportal cbioportal copied to clipboard

Upgrade to new CIViC API

cbioportal
cbioportal copied to clipboard