data:image/s3,"s3://crabby-images/9d052/9d0528b38da7ca5060d3bb9fc45caa55a2812742" alt="View Repositories"
Health Analytics Case Study data:image/s3,"s3://crabby-images/5b927/5b927630a9cb1131b997dddaad3f9c6920cddaa4" alt=""
This case study is contained within the Serious SQL by Danny Ma
📕 Table of contents
- 🛠️ Overview
- 🚀 Solutions
- 💻 Key Highlights
🛠️ Overview
With the Health Analytics Mini Case Study, I queried data to bring insights to the following questions:
- How many
unique users
exist in the logs dataset?
- How many total
measurements
do we have per user on average
?
- What about the
median
number of measurements per user?
- How many users have
3 or more
measurements?
- How many users have
1,000 or more
measurements?
- Have logged
blood glucose
measurements?
- Have
at least 2 types
of measurements?
- Have all 3 measures -
blood glucose, weight and blood pressure
?
- What is the
median systolic/diastolic
blood pressure values?
🚀 Solutions
data:image/s3,"s3://crabby-images/d0682/d0682ed091de4242122e599caeec91babe384867" alt="Question 1"
How many unique users exist in the logs dataset?
SELECT COUNT (DISTINCT id)
FROM health.user_logs;
Note:
For question 2-8, I created a temporary table:
Step 1: Firstly, I ran a code DROP TABLE IF EXISTS
statement to clear out any previously created tables:
DROP TABLE IF EXISTS user_measure_count;
Step 2: Next, I created a new temporary table using the results of the query below:
CREATE TEMP TABLE user_measure_count AS
SELECT
id,
COUNT(*) AS measure_count,
COUNT (DISTINCT measure) AS unique_measures
FROM health.user_logs
GROUP BY 1;
data:image/s3,"s3://crabby-images/7c570/7c570343387fe6592778466b97eaf19d05c89c64" alt="Question 2"
How many total measurements do we have per user on average?
SELECT
ROUND (AVG(measure_count), 2) AS mean_value
FROM user_measure_count;
data:image/s3,"s3://crabby-images/b3ef0/b3ef0e48aa8f35dc918cc2e6bfc316ce5344fa30" alt="Question 3"
What about the median number of measurements per user?
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value
FROM user_measure_count;
data:image/s3,"s3://crabby-images/9a401/9a4018bfa1c167195f1a6459768f52882567255f" alt="Question 4"
How many users have 3 or more measurements?
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 3;
data:image/s3,"s3://crabby-images/ba07e/ba07eebb944947d1e4e2619b26583c03608f6bcf" alt="Question 5"
How many users have 1,000 or more measurements?
SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 1000;
data:image/s3,"s3://crabby-images/1932f/1932f26ab896b2e7c1eb612f0c620e4a9e3db4a8" alt="Question 6"
Have logged blood glucose measurements?
SELECT
COUNT(DISTINCT id)
FROM health.user_logs
WHERE measure = 'blood_glucose';
data:image/s3,"s3://crabby-images/ca149/ca1493351fc32169db0b07029a91e0e7b0be4241" alt="Question 7"
7. Have at least 2 types of measurements?
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures >= 2;
data:image/s3,"s3://crabby-images/18149/18149e4275d8b0056b4c46972b5060e6d4658e43" alt="Question 8"
Have all 3 measures - blood glucose, weight and blood pressure?
SELECT
COUNT(*)
FROM user_measure_count
WHERE unique_measures = 3;
data:image/s3,"s3://crabby-images/894b0/894b04accda3fea854d733b42e046902d0aef7e8" alt="Question 9"
What is the median systolic/diastolic blood pressure values?
SELECT
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic,
PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic
FROM health.user_logs
WHERE measure = 'blood_pressure';
median_systolic |
median_diastolic |
126 |
79 |
💻 Key Highlight
Initial thoughts:
Even though this is a short assignment which cover basic SQL syntax, I did run into problems several time during the solving process. However, it helped me to have a better understanding about data exploration using SQL from theories to real life application.
Some of the main areas covered in this case study, including:
-
Sorting Values
-
Inspect Row Counts
-
Duplicates & Record Frequency Review
-
Summary Statistics
(MEAN, MEDIAN)