COVID-QA icon indicating copy to clipboard operation
COVID-QA copied to clipboard

Real-Time data scraping for countries

Open ivan-zidov opened this issue 4 years ago • 1 comments

Hi,

I have been working on chatbot for croatian language. Here is little help for real time scraping.

image

import requests from bs4 import BeautifulSoup import numpy as np import pandas as pd

url = "https://www.worldometers.info/coronavirus/" headers = {'Accept': 'text/html'} response = requests.get(url, headers=headers) #print(response) content = response.content soup = BeautifulSoup(content,"lxml")

elements = (np.array([[y.text for y in x.find_all("td")] for x in soup.find(id="main_table_countries_today").find_all("tr")])) elements = [x for x in elements if len(x)==9]

wordmeters = pd.DataFrame(elements) wordmeters.columns = ["Country,Other","Total Cases","New Cases","Total Deaths","New Deaths","Total Recovered","Active Cases","Serious, Critical","Tot Cases/1M pop"] wordmeters

ivan-zidov avatar Mar 23 '20 11:03 ivan-zidov

Sorry for the late reply. Integrating this data for questions like "How many cases are in X?" is actually on our roadmap, but would require quite a lot of implementations:

  1. We need to identify if a question is asking for this type of structured information.
  2. What type is asked for, new cases, total cases/deaths etc.
  3. Finally we need to match the country description in your Dataframe with the country that was asked for. [4. Maybe handle spelling mistakes in either country or what was asked for.]

It would help to be able to query an API with this info. Do you have any updates on your integration or would like to implement the proposed steps in this repository?

Timoeller avatar Apr 03 '20 09:04 Timoeller