covid19za icon indicating copy to clipboard operation
covid19za copied to clipboard

[DATA] Help needed for Hospital Data

Open HerkulaasCombrink opened this issue 4 years ago • 87 comments

Which Dataset

health_system_za_public_hospitals.csv

Error Description

District and subdistrict data needed Estimated population size needed for each district

Suggested fixes

  1. Populating the data for the proposed file.
  2. Creating an accurate dataset that is already in a computer-readable format, and not in a PDF etc.
  3. Finding an updated Private and public Hospital repo for each South African province.

##Volunteer to fix the data Choose the data you want to fix/add and volunteer to the data you want to commit to https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

HerkulaasCombrink avatar Mar 31 '20 19:03 HerkulaasCombrink

@MikeMcMalace the humanitarian data exchange has pop size at various admin levels for SA - https://data.humdata.org/dataset/south-africa-administrative-levels-0-3-population-statistics. It was last updated in 2018 according to metadata.

I'd be interested to help with this. Also involved in https://afrimapr.github.io/afrimapr.website/blog/2020/healthsites-app/ and we've just started to work with healthsites.io as well. Let me know how I can help?

anelda avatar Apr 01 '20 11:04 anelda

@anelda we are currently working on a map visualization that is a bit similar to the one shown in your last link. For now ,most helped needed is on the data - populating the columns with

  • Number of beds per identified hospital
  • Number of staff members per hospital
  • Geolocation of Covid19 testing centers
  • Webpages of hospitals
  • And just about any other incomplete info on the hospital data

The data file is the one that @MikeMcMalace has identified when he opened this issue.

elolelo avatar Apr 01 '20 11:04 elolelo

Three questions:

  1. Do you have a way to prevent different people working on the same thing for this? e.g. I can get webpages for hospitals but it would be tragic if others are working on this at the same time, duplicating effort.

  2. What is the relationship between health_system_za_public_hospitals_extended_details.csv vs health_system_za_public_hospitals_contacts.csv vs health_system_za_public_hospitals.csv? Can these be merged?

  3. Is there any value in contacting [email protected] who maintains this website - http://doctors-hospitals-medical-cape-town-south-africa.blaauwberg.net/hospitals_clinics_state_hospitals/state_public_hospitals_clinics_eastern_cape_south_africa/ (for each province with a lot of the data we need for each hospital) to hear if they can do a data dump of the data displayed on their website?

anelda avatar Apr 01 '20 12:04 anelda

Thank you so much for your inputs, @anelda .

  1. I propose that we volunteer on this issue so that there isn't overlap. Alternatively, we can create a google doc and people can volunteer from there? - which would you think would work best?

  2. Yes, they can. From the start, we needed details and information, and as time continued, the datasets expanded. We have a hospital dictionary, and I can imagine that we do not have all the IDs of all hospitals on this list. If I had to be pragmatic about it, I would propose that we update the library file, and then use that as a reference to see what we do not have.

  3. Yes, there is. I have made contact with a few private hospital groups, and have reached out to provincial managers, but unfortunately, I have had little success. It is an excellent suggestion. Would you mind making contact?

HerkulaasCombrink avatar Apr 01 '20 12:04 HerkulaasCombrink

For hospital beds, there is this study:

Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds A J DellI; D KahnII (IBSc, MB ChB, PhD; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa; IIMB ChB, FCS (SA), ChM; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa) http://dx.doi.org/10.7196/samj.2017.v107i12.12539 published 2017 with a contact email for the lead author [email protected]

Maybe they can share the data they collected - here is how they did it (a lot of work have gone into collecting/verifying the data)

A list of all hospitals in SA was obtained from the Provincial DoH and cross-referenced with electronic databases of hospitals in SA (Medpages and hospital websites). These were cross-referenced with the NDoH hospital list from the office of the minister of health.

The Health Systems Trust provided estimates of the total number of hospitals and hospital beds for each province for comparison among the provinces. The public hospitals were grouped according to the nine provinces in SA and were subdivided into major district municipalities.

All hospitals were contacted telephonically and by email. Either the chief executive officer, superintendent or matron (in the case of district-level facility) in each hospital was contacted to obtain the relevant data. Data were collected from 1 October to 31 December 2014. Private hospital data were readily available from the Hospital Association of SA (HASA) and included extensive data on the number of hospitals, total number of hospital beds and type of beds. Private hospitals were contacted telephonically to verify these data.

anelda avatar Apr 01 '20 12:04 anelda

Brilliant, brilliant study - and this is the data we need. It is a shame that this is 2017, but, it does have the data we require. Thank you for your insight @anelda. I do not personally know the authors, but I do know the department. Would you mind making contact?

HerkulaasCombrink avatar Apr 01 '20 12:04 HerkulaasCombrink

@elolelo , what is your idea of the websites? I am trying to find the geo-locations of the testing centres but I am picking up something that exponentially might complicate things, that labs/pathologists might be referring samples. This means that we need to track down core testing facilities. I can ask for this.

HerkulaasCombrink avatar Apr 01 '20 12:04 HerkulaasCombrink

Thank you so much for your inputs, @anelda .

1. I propose that we volunteer on this issue so that there isn't overlap. Alternatively, we can create a google doc and people can volunteer from there? - which would you think would work best?

2. Yes, they can. From the start, we needed details and information, and as time continued, the datasets expanded. We have a hospital dictionary, and I can imagine that we do not have all the IDs of all hospitals on this list. If I had to be pragmatic about it, I would propose that we update the library file, and then use that as a reference to see what we do not have.

3. Yes, there is. I have made contact with a few private hospital groups, and have reached out to provincial managers, but unfortunately, I have had little success. It is an excellent suggestion. Would you mind making contact?
  1. Let's start a Google Doc - great suggestion. This thread may become quite long and people might miss stuff if they have to read through everything. I can do it and share unless you have a covid19 Google Folder already where you want to keep things together?

  2. Which one is the library file? I can do a compare and merge on the files unless either of you have a script ready to do that? I'll probably do it in R and can share the merged file in the next hour or so

  3. I can reach out to the website owners. Fingers crossed that the email is still functional and that they're checking it.

anelda avatar Apr 01 '20 12:04 anelda

Brilliant, brilliant study - and this is the data we need. It is a shame that this is 2017, but, it does have the data we require. Thank you for your insight @anelda. I do not personally know the authors, but I do know the department. Would you mind making contact?

I'll email them.

anelda avatar Apr 01 '20 12:04 anelda

@elolelo @anelda the link to the doc is below.

https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

Choose an item, then update accordingly.

There are five hospital files:

  • health_system_za_hospital_id (which contains the ID and hospital name as they appear in the other four files)
  • health_system_za_private_hospitals
  • health_system_za_public_hospitals
  • health_system_za_public_hospitals_contacts
  • health_system_za_public_hospitals_extended_details

The idea is to gather, create the complete files, then merge at the end.

I used Python for the merging, but any basic inner join will do - since the current ID's are already linked to the files.

HerkulaasCombrink avatar Apr 01 '20 13:04 HerkulaasCombrink

For hospital beds, there is this study:

Geographical maldistribution of surgical resources in South Africa: A review of the number of hospitals, hospital beds and surgical beds A J DellI; D KahnII (IBSc, MB ChB, PhD; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa; IIMB ChB, FCS (SA), ChM; Department of Surgery, Faculty of Health Sciences, and Groote Schuur Hospital, University of Cape Town, South Africa) http://dx.doi.org/10.7196/samj.2017.v107i12.12539 published 2017 with a contact email for the lead author [email protected]

Great news! Angela responded within 25 minutes to my email. She shared her thesis in PDF (also available from http://hdl.handle.net/11427/22796) and is busy looking through her spreadsheets to find the most recent one. She'll share that as soon as she's found it.

We have to make sure people who share their hard collected open datasets receive due credit!

anelda avatar Apr 01 '20 14:04 anelda

@anelda I echo your request and acknowledge your statement. Thank you.

HerkulaasCombrink avatar Apr 01 '20 14:04 HerkulaasCombrink

@anelda I echo your request and acknowledge your statement. Thank you.

I'll create an issue about this. It's important for data provenance as well

anelda avatar Apr 01 '20 14:04 anelda

@anelda I echo your request and acknowledge your statement. Thank you.

I'll create an issue about this. It's important for data provenance as well

See #117

anelda avatar Apr 01 '20 15:04 anelda

@elolelo @anelda the link to the doc is below.

https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

@MikeMcMalace good morning! I can't access this file? Can you help please?

anelda avatar Apr 02 '20 05:04 anelda

Please try link @anelda @elolelo

HerkulaasCombrink avatar Apr 02 '20 05:04 HerkulaasCombrink

@anelda @elolelo Good morning!

HerkulaasCombrink avatar Apr 02 '20 05:04 HerkulaasCombrink

Please try link

Thanks @MikeMcMalace . It's view only mode though?

anelda avatar Apr 02 '20 06:04 anelda

@anelda @MikeMcMalace Good morning, @anelda , in this #117 issue, do you suggest that @MikeMcMalace should create another sheet to add the details about sources of data ?

elolelo avatar Apr 02 '20 07:04 elolelo

in this #117 issue, do you suggest that @MikeMcMalace should create another sheet to add the details about sources of data ?

Good morning @elolelo. Hmmm... I wonder if it may be worth our while to have a quick online meeting to chat about the data and where we want to go with it? I received hospital bed data from Angela this morning and am busy cleaning it up. What do you think @MikeMcMalace

anelda avatar Apr 02 '20 07:04 anelda

@anelda - I think the meeting may be worth our while. I should be available from 11 am and onwards today . Wow!! sounds like you've recieved valueable data - I just saw now that 87 000 beds in the public sector are available for Covid19 patients - I wondered where (in which hospitals) are those beds - so hopefully your data could answer this question.

elolelo avatar Apr 02 '20 08:04 elolelo

If you send me your email addresses and times when you're available, I can set up a meeting in Zoom or Hangouts. Don't want to share meeting link here as there's been problems with trolls crashing open online meetings. [email protected]. Thanks!

anelda avatar Apr 02 '20 08:04 anelda

Love the idea of a meeting! Yes. Currently, I see a gap at 12:00? Would that suffice?

Can we invite @vukosim to this meeting, please?

HerkulaasCombrink avatar Apr 02 '20 08:04 HerkulaasCombrink

Okay, I created a Doodle poll for today to see who can attend and when. Anyone who'd like to join can complete the poll. I'll need people to send me an email address where I can share the meeting link. Thanks! https://doodle.com/poll/498wfb79wwfiuuev. I set the meeting for 1 hour, but we can keep it shorter if needs be. Will be fun to put faces to names :-)

anelda avatar Apr 02 '20 08:04 anelda

@elolelo @anelda the link to the doc is below. https://docs.google.com/spreadsheets/d/1ujiuSd656BfIO3AT86GTr17oveaev-qBuYbu_v45RC4/edit?usp=sharing

@MikeMcMalace good morning! I can't access this file? Can you help please?

Is it working?

Thank you for the meeting, and insight. :)

HerkulaasCombrink avatar Apr 02 '20 13:04 HerkulaasCombrink

@MikeMcMalace Thanks, it's now working. The e-meet was indeed insightful!

elolelo avatar Apr 02 '20 15:04 elolelo

Hi everyone, thanks so much for the meeting yesterday. I'm happy to report that Angela added her hospital bed info to Figshare so there is a proper citation for it now and it's also officially licensed as CC-BY - https://figshare.com/articles/SURGICAL_RESOURCES_latestmarch2016_xlsx/12066711. I'll share the cleaned data here later today.

anelda avatar Apr 03 '20 02:04 anelda

Good morning everyone. The clean data from Angela Dell's thesis (hospital beds and number of surgeons for public and private hosps - last updated March 2016 for 543 hospitals) is now available at https://figshare.com/articles/South_African_Hospital_Beds/12073596. There is also a readme to describe how I went from the raw data to the resulting CSV. I'm trying to do things in a way that we can track errors if we find them and also to make it reproducible. Hope this is useful to your efforts :-)

anelda avatar Apr 03 '20 10:04 anelda

Good morning everyone. The clean data from Angela Dell's thesis (hospital beds and number of surgeons for public and private hosps - last updated March 2016 for 543 hospitals) is now available at https://figshare.com/articles/South_African_Hospital_Beds/12073596. There is also a readme to describe how I went from the raw data to the resulting CSV. I'm trying to do things in a way that we can track errors if we find them and also to make it reproducible. Hope this is useful to your efforts :-)

Hi @anelda Thanks a lot! This data might be relatively old but there is no doubt that it's useful and so is the readme file. Will update you on the viz when it's ready.Thanks once again

elolelo avatar Apr 03 '20 11:04 elolelo

Hi everyone, I've been thinking a lot about the question of reproducibility in terms of putting the open hospital dataset together. Here is an attempt in R to look at various datasets that are available and tidy them up in order to be able to compare and/or combine programmatically - https://htmlpreview.github.io/?https://github.com/anelda/za_open_hospital_data/blob/master/reports/za_hospital_analysis_v2.html

The document shows only really the first phase of pulling the data in from the various sources in order to compare and combine them. I'm working on the next step where we can use fuzzy logic to match facility names to merge across datasets to harvest the maximum number of attributes that are available.

The R project is available at https://github.com/anelda/za_open_hospital_data with all the "cleaner" CSV files as well.

Let me know if you think it may be useful for this project or if you have ideas for improving it? It's not super pretty at the moment, but I'll work on formatting in the next iteration.

anelda avatar Apr 05 '20 00:04 anelda