kg-covid-19 icon indicating copy to clipboard operation
kg-covid-19 copied to clipboard

Panacea Lab COVID19 Twitter Chatter

Open callahantiff opened this issue 4 years ago • 6 comments

I'd like to work on mapping social media chatter from Twitter to the OBOs, which we could then parse into edge lists. I think it would be interesting to capture things like symptoms and map them to exposures and phenotypes.

I'd like to prioritize the data mined by Juan Banda's Lab: https://github.com/thepanacealab/covid19_twitter

This doesn't have to be a priority, but I do think it could be really interesting!

callahantiff avatar Mar 23 '20 21:03 callahantiff

@callahantiff This just popped on my radar. Happy to help with this. Let me know how you are planning on processing this.

deepakunni3 avatar Mar 23 '20 21:03 deepakunni3

@callahantiff This just popped on my radar. Happy to help with this. Let me know how you are planning on processing this.

I would love to work together on this! I'll email you to see if there is a time we could chat. I think it would be good to pow-wow about existing tools for normalizing to OBOs and maybe outline a plan for what to focus on first!

callahantiff avatar Mar 23 '20 21:03 callahantiff

is this similar to what we'd do for CORD-19, i.e. information entity (tweet) mentions entity (OBO class or gene or ...)?

There are specific properties we might like to include on the tweet, e.g time. Not sure how these properties would be used in ML but certainly useful for display/querying

cmungall avatar Apr 07 '20 17:04 cmungall

is this similar to what we'd do for CORD-19, i.e. information entity (tweet) mentions entity (OBO class or gene or ...)?

There are specific properties we might like to include on the tweet, e.g time. Not sure how these properties would be used in ML but certainly useful for display/querying

That's what I was initially thinking, perhaps with some emphasis on things like symptoms. It would be great if we can draw some correlation to reported outcomes as well (when/if they exist).

callahantiff avatar Apr 07 '20 17:04 callahantiff

There are multiple ways we can parse this:

  • Information Content Entity -> OBO terms
  • Tweet -> Phenotypes -> Phenopackets
  • Tweets over time -> time series (might be brittle)

It would rely on some form of NER first

deepakunni3 avatar Apr 07 '20 18:04 deepakunni3

Interesting discussion, I can harp here on time modeling as well :) But just getting into that topic, discussion going on in another thread -- but minimally storing the tweet source time point could help. Less for graph learning but more search and data science/modeling ...

covidscholar.com is also interested in tweets but they have their hands full and I haven't shared the Twitter corpus that Tiffany shared ... In theory we can use their NLP effort to enrich our graph but let's take a stab first to see what how this plays out.

best, marcin

On Tue, Apr 7, 2020 at 11:22 AM Deepak [email protected] wrote:

There are multiple ways we can parse this:

  • Information Content Entity -> OBO terms

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Knowledge-Graph-Hub/kg-covid-19/issues/32#issuecomment-610545904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDJLTU3CIRP42Z7GP57IATRLNVOVANCNFSM4LSGBS3Q .

realmarcin avatar Apr 07 '20 18:04 realmarcin