fusion
fusion copied to clipboard
Create combined dataset
From the existing data sources, we will take the following columns:
-
activity_watch_summarized =
summary_day, total_time_surfing, what_app_did_user_spend_x_percent_of_time_on (one column per percent) , events_usage (tuple: event_classification (encoded), sum_total_time) -
google_calendar =
summary_day, total_time_in_meetings, average_meeting_duration, mode_meeting_duration, count_of_meetings_initiated, count_of_meetings_invited, avg_number_of_people_per_meeting -
oura =
(drop some rows) - remove combination identifiers -
tweets =
summary_day, count_of_tweets, count_of_total_positive, total_neutral, total_negative, sum_likes_positive, sum_likes_negative, sum_likes_neutral, sum_retweets_positive, sum_retweets_negative, sum_retweets_neutral
these datasets will then be joined by summary_day for a single day's entry
Things to keep in mind during combination:
activity watch
- cluster events into groups (max 5 & then make them columns)
- encode all events, consider magnitude = 0: 70, 1: 30
google calendar
- considering how many meetings were created impromptu
- meetings initiated vs invited ( giving / taking time )
- how start_time vs end_time fit into daily activities
- original tweets vs retweets
- how are people interacting with tweets
- people interacting with
we now have daily summaries for
- oura --- daily summaries of activity, sleep and readiness
- spotify - average of the number of features happening daily
- activitywatch - day, classification of event, number of times it happened