fusion icon indicating copy to clipboard operation
fusion copied to clipboard

Create combined dataset

Open oreHGA opened this issue 4 years ago • 1 comments
trafficstars

From the existing data sources, we will take the following columns:

  • activity_watch_summarized = summary_day, total_time_surfing, what_app_did_user_spend_x_percent_of_time_on (one column per percent) , events_usage (tuple: event_classification (encoded), sum_total_time)

  • google_calendar = summary_day, total_time_in_meetings, average_meeting_duration, mode_meeting_duration, count_of_meetings_initiated, count_of_meetings_invited, avg_number_of_people_per_meeting

  • oura = (drop some rows) - remove combination identifiers

  • tweets = summary_day, count_of_tweets, count_of_total_positive, total_neutral, total_negative, sum_likes_positive, sum_likes_negative, sum_likes_neutral, sum_retweets_positive, sum_retweets_negative, sum_retweets_neutral

these datasets will then be joined by summary_day for a single day's entry

Things to keep in mind during combination:

activity watch

  • cluster events into groups (max 5 & then make them columns)
  • encode all events, consider magnitude = 0: 70, 1: 30

google calendar

  • considering how many meetings were created impromptu
  • meetings initiated vs invited ( giving / taking time )
  • how start_time vs end_time fit into daily activities

twitter

  • original tweets vs retweets
  • how are people interacting with tweets
  • people interacting with

oreHGA avatar May 18 '21 10:05 oreHGA

we now have daily summaries for

  • oura --- daily summaries of activity, sleep and readiness
  • spotify - average of the number of features happening daily
  • activitywatch - day, classification of event, number of times it happened

oreHGA avatar Nov 18 '21 05:11 oreHGA