pypath icon indicating copy to clipboard operation
pypath copied to clipboard

CellPhoneDB v5 (update and is_ppi flag)

Open dbdimitrov opened this issue 8 months ago • 12 comments

Hey Denes,

Recently, CellPhoneDB got bumped to v5, and the data is stored here: https://github.com/ventolab/cellphonedb-data/tree/master

Seems to have changed format from: https://github.com/saezlab/pypath/blob/bf81f34120b82157fa3ebc15d39b0489b97fbe5e/pypath/resources/urls.py#L1103

Let me know if I can help with this. Daniel

dbdimitrov avatar Nov 28 '23 07:11 dbdimitrov

@deeenes also please use the is_ppi flag, I found a lot of erroneous interactions between enzymes and receptors. (no metabolite)

dbdimitrov avatar Nov 30 '23 13:11 dbdimitrov

Maybe how I process it here would help:

https://github.com/saezlab/liana-py/issues/60

dbdimitrov avatar Dec 01 '23 07:12 dbdimitrov

Hi Daniel,

As far as I could find, pypath i already using the CellPhoneDB git as a source for the data (see here and then here), so I think it is already using the v5 version of the data.

What I found out now when checking this, is that although the retrieval of interactions works fine:

> from pypath.inputs import cellphonedb
> list(cellphonedb.cellphonedb_interactions())[-1]
CellphonedbInteraction(id_a='P16070', id_b='O43914', sources='CellPhoneDB', references='', interaction_type='unknown-unknown', type_a='unknown', type_b='unknown')

When you try to retrieve the ligand-receptor interactions it returns a tuple of empty sets:

> cellphonedb.cellphonedb_ligands_receptors()
(set(), set())

This seems to be an issue in how the complex annotations were being imported, and therefore the ligand/receptor attributes were being all labeled as False, I think I fixed it in #279

Regarding the use of is_ppi flag, seems a bit more complex to implement (and I wouldn't want to break anything), so maybe we can discuss in person and I could try to take a look into it, or we can wait for @deeenes to come back :sweat_smile:

Since this should resolve your initial question, I'll close the issue and we can discuss the is_ppi thing later :)

Best

Nic-Nic avatar Feb 15 '24 12:02 Nic-Nic

@Nic-Nic thanks Nico. Though I would say the is_ppi is crucial since there are now a lot of enzyme-enzyme interactions imported ad ligand-receptors 😅

dbdimitrov avatar Feb 15 '24 13:02 dbdimitrov

I renamed the issue and reopened since the two comments are tied. The flag was introduced along with the update of the database. 🙂

dbdimitrov avatar Feb 15 '24 13:02 dbdimitrov

PS. Also, there is no need to implement the flag, it's simply about setting it to False, when whe resource is obtained. We don't want to include those, and I can think of limited use of having them even if we do.

dbdimitrov avatar Feb 15 '24 13:02 dbdimitrov

Added the flag to the import method of the interactions database from CellPhoneDB (see #281). The decision on whether to filter out the False ones or not, is more for @deeenes to take :sweat_smile: Since the flag is now there (once the PR is merged), you can easily then apply the filter in your code if you deem it necessary :)

Nic-Nic avatar Feb 15 '24 15:02 Nic-Nic

Hey Nico, thanks a lot.

I think it should definitely be False to default, or at least the clients should have it as false if possible - though that might be more work.

In short, they assume that the last production enzyme of a metabolite in one cell type, and a receptor/enzyme of another translate to the metabolite-receptor interaction. I think it's very specific to be pull by default as ligand-receptor interactions by the clients :)

dbdimitrov avatar Feb 15 '24 16:02 dbdimitrov

Hey @deeenes @Nic-Nic,

It seems to me that the solution we discussed yesterday for liana, i.e. access the databases via the client, will not work if we don't filter the non-ppis here.

These non-ppis are either way incorporated into MetalinksDB, so for our usecases we don't need them.

So, I'm re-opening the issue. Let me know if you want me to add the line that the dataframe.

Daniel

dbdimitrov avatar Feb 20 '24 11:02 dbdimitrov

Hey @dbdimitrov, you're right, having the attribute itself doesn't result in the removal of those interactions. We need two little things:

  1. This is one of the few tasks that belongs to the scope of integration (between OmniPath & LIANA), so there should be one line either in LIANA or in omnipath Python that makes sure is_ppi=True is removed;
  2. In the OmniPath network dataset definitions, the is_ppi interactions should go into a separate dataset, definitely not to the ligand-receptor one (this makes the prev. point redundant, but better to be safe, it doesn't cost anything)

We'll soon take care of these

deeenes avatar Feb 20 '24 14:02 deeenes

Ping @deeenes, it will become time sensitive very soon :smile:

dbdimitrov avatar Mar 05 '24 17:03 dbdimitrov

@deeenes :eyes:

dbdimitrov avatar Mar 21 '24 08:03 dbdimitrov