predictcovid icon indicating copy to clipboard operation
predictcovid copied to clipboard

Any help needed?

Open williamluke4 opened this issue 4 years ago • 17 comments

williamluke4 avatar Mar 18 '20 21:03 williamluke4

Absolutely. Are there particular areas you’re interested in contributing to?

The #1 need right now is to switch. The data source from scraping Worldometers to the Johns Hopkins data repository they’re now publishing on GitHub and updating daily.

That will allow us to support any country publishing numbers: not just the few that Worldometers supports on their website (which are the ones currently on the site).

Would you be interested in taking that on?

zachlatta avatar Mar 19 '20 00:03 zachlatta

Do you have a link to the Github?

williamluke4 avatar Mar 20 '20 15:03 williamluke4

@williamluke4: It's https://github.com/CSSEGISandData/COVID-19. I'm thinking we should be scraping from https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports.

zachlatta avatar Mar 20 '20 18:03 zachlatta

@zachlatta Ok perfect, ill have a look and let you know my thoughts :)

williamluke4 avatar Mar 20 '20 19:03 williamluke4

Thank you. Would greatly appreciate if you could step in here. Sorely needed.

-- Zach Latta

https://zachlatta.com ( http://zachlatta.com ) @zachlatta ( https://twitter.com/zachlatta ) / fb ( https://www.facebook.com/crynix ) / github ( https://github.com/zachlatta )

On Fri, Mar 20, 2020 at 3:14 PM, William Luke < [email protected] > wrote:

@ zachlatta ( https://github.com/zachlatta ) Ok perfect, ill have a look and let you know my thoughts :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/lachlanjc/covid19/issues/3#issuecomment-601869067 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAHSH6FZNEMBGRO7PPQOVDTRIO6BVANCNFSM4LO24WQQ ).

zachlatta avatar Mar 20 '20 19:03 zachlatta

Thoughts

  1. Query csse_covid_19_data/csse_covid_19_daily_reports to get all the files
interface File {
  name: string;
  path: string;
  sha: string;
  size: string;
  url: string;
  html_url: string;
  git_url: string;
  download_url: string;
  type: string;
  _links: {
    self: string;
    git: string;
    html: string;
  }
}
async function fetchFiles(){
  const response = await fetch("https://api.github.com/repos/CSSEGISandData/COVID-19/contents/csse_covid_19_data/csse_covid_19_daily_reports")
  const files: File[] = await response.json();
  const csvFiles = files.filter(file => file.name.includes(".csv"))
  console.log(csvFiles);
}
  1. Once we have the files then get the dates from the file names name: '03-13-2020.csv'
  2. Parse each CSV
import fetch from "isomorphic-unfetch";
import parse from "date-fns/parse";
import parseISO from 'date-fns/parseISO';
import { isValid } from "date-fns";
interface Entry {
  province: string;
  country: string;
  lastUpdate: Date | null;
  confirmed: number;
  deaths: number;
  recovered: number;
  long: string | null;
  lat: string | null;
}
enum CSV {
  PROVIENCE,
  COUNTRY,
  LAST_UPDATE,
  CONFIRMED,
  DEATHS,
  RECOVERED,
  LAT,
  LONG
}

function parseData(data: string) {
  const lines = data.split("\n");
  // Removes the CSV Headers
  lines.shift();
  const entries: Entry[] = [];
  lines.forEach((line: string) => {
    const items = line.trim().split(",");
    if (items.length >= 6) {
      entries.push({
        province: items[CSV.PROVIENCE],
        country: items[CSV.COUNTRY],
        lastUpdate: parseDate(items[CSV.LAST_UPDATE]),
        confirmed: items[CSV.CONFIRMED] ? parseInt(items[CSV.CONFIRMED]) : 0,
        deaths: items[CSV.DEATHS] ? parseInt(items[CSV.DEATHS]) : 0,
        recovered: items[CSV.RECOVERED] ? parseInt(items[CSV.RECOVERED]) : 0,
        lat: items[CSV.LAT] ? items[CSV.LAT] : null,
        long: items[CSV.LONG] ? items[CSV.LAT] : null,
      });
    }
  });
  return entries;
}

async function fetchCSV(filename: string) {
  const response = await fetch(
    `https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/${filename}.csv`
  );
  const data = await response.text();
  const entries = parseData(data);
}

williamluke4 avatar Mar 20 '20 21:03 williamluke4

Very quick and hacky, let me know your thoughts

williamluke4 avatar Mar 20 '20 21:03 williamluke4

Hi, I would like to contribute to include India in the dashboard. We have a billion people here and we need prediction for people to start taking precaution now!

kn-neeraj avatar Mar 21 '20 06:03 kn-neeraj

@williamluke4: High-level, logic makes sense, but can you please adapt it for the schema we currently have going? You can see it at https://github.com/lachlanjc/covid19/blob/master/api/prisma/schema.prisma.

If you can do that and are up for the task, a pull request would be greatly appreciated and I would be happy to prioritize and merge. This would also enable support for India, which would meet @kn-neeraj's need (which I think would be a fantastic addition).

zachlatta avatar Mar 21 '20 06:03 zachlatta

@kn-neeraj: Please see the above comment. A pull request to switch out data source from scraping Worldometers to importing from https://github.com/CSSEGISandData/COVID-19 would enable India support (and would be a very welcome update, as it would enable support for every other country too).

@williamluke4 / @kn-neeraj: The file that needs to be rewritten to pull from this new source is https://github.com/lachlanjc/covid19/blob/master/api/src/functions/scrape.js. Once that is rewritten, I will set up a separate service to call that function every hour so the site is constantly updated.

zachlatta avatar Mar 21 '20 06:03 zachlatta

@zachlatta trying to figure this out. Comfortable with Python more than javascript. But let me figure out how to help. @williamluke4 good work on the above script. Are you working on adapting it to schema? Let me know if you need any help

kn-neeraj avatar Mar 21 '20 06:03 kn-neeraj

@zachlatta - Wasn't able to figure out how exactly data is stored, but I built a simple endpoint for the Johns Hopkins data repository that follows the same schema this project currently does.

For countries with multiple regions, all the regions data is aggregated into one.

Just GET with the country's ISO code. Eg:

https://covid-data--jajoosam.repl.co/iso/in gives

{
  "country": "India",
  "lastUpdated": "Sat, 21 Mar 2020 09:34:39 GMT",
  "data": [{
      "date": "3/18/20",
      "totalCases": 156,
      "newCases": 14,
      "totalDeaths": 3,
      "newDeaths": 0,
      "currentInfected": 150
    },
    {
      "date": "3/19/20",
      "totalCases": 194,
      "newCases": 38,
      "totalDeaths": 4,
      "newDeaths": 1,
      "currentInfected": 186
    },
    {
      "date": "3/20/20",
      "totalCases": 244,
      "newCases": 50,
      "totalDeaths": 5,
      "newDeaths": 1,
      "currentInfected": 234
    }]
}

jajoosam avatar Mar 21 '20 09:03 jajoosam

I'm pretty swamped atm, so if someone else could take over that would be great. Great work @jajoosam

williamluke4 avatar Mar 21 '20 12:03 williamluke4

Are there any tasks to tackle? I'd love to help out!

rishiosaur avatar Mar 22 '20 19:03 rishiosaur

@lachlanjc, can provide guidance here as I know you have some WIP work?

-- Zach Latta

https://zachlatta.com ( http://zachlatta.com ) @zachlatta ( https://twitter.com/zachlatta ) / fb ( https://www.facebook.com/crynix ) / github ( https://github.com/zachlatta )

On Sun, Mar 22, 2020 at 3:57 PM, Rishi Kothari < [email protected] > wrote:

Are there any tasks to tackle? I'd love to help out!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub ( https://github.com/lachlanjc/covid19/issues/3#issuecomment-602263513 ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/AAHSH6H2KCLD32JNWJWOWULRIZUTHANCNFSM4LO24WQQ ).

zachlatta avatar Mar 22 '20 19:03 zachlatta

@rishiosaur Check out #4, where I've adapted what @jajoosam started. It has some remaining issues (UK & Netherlands data primarily, haven't investigated why those seem broken), then the primary task is wiring up the data being fetched to saving to the database.

lachlanjc avatar Mar 22 '20 20:03 lachlanjc

@lachlanjc line 68 of new scrape.js should be

let dates = Object.keys(agg).filter(, newscrape.js should work then - without it there'd be an issue with every country with multiple COVID-19 documented regions :)

jajoosam avatar Mar 22 '20 20:03 jajoosam