YouTubeVideoTimestamps icon indicating copy to clipboard operation
YouTubeVideoTimestamps copied to clipboard

Theodore Meynard - Test your data like you test your code | Pydata London 2022

Open minimav opened this issue 2 years ago • 0 comments

Video

I'm not sure whether hierarchical timestamps are supported but I've done some of them that way just in case it's useful.

Timestamps:

  • 0:00 - Introduction
  • 0:35 - Agenda
  • 1:17 - About the speaker
  • 1:49 - What are data unit tests?
  • 3:37 - Data unit tests verify expectations
  • 5:09 - Frameworks for data unit tests
  • 8:16 - great_expectations
  • 10:12 - Code Example - wines quality datasets
    • 12:06 - Loading red wine dataset
    • 12:34 - Check quality is an integer rating out of 10
    • 13:00 - Check sulphate distribution is unchanged using KL Divergence
    • 13:38 - Saving/loading an expectation suite
    • 14:22 - Question - Which distribution is used in the sulphate distribution test?
    • 14:33 - Answer
    • 15:44 - Loading white wine dataset and running data unit tests on it
    • 17:08 - Using data unit tests in practice at Get Your Guide
  • 19:50 - Conclusion
  • 21:15 Q & A
    • 21:26 - Question - Could some of these problems be solved in conjunction with data versioning like DVC?
    • 21:59 - Answer
    • 23:00 - Question - How do you deal with validating ranges that differ based on values in another column?
    • 23:30 - Answer
    • 24:31 - Question 1 - Can the test report be exported to a file?
    • 24:52 - Question 2 - Can great_expectations be used to compare 2 datasets?
    • 25:09 - Answer 1
    • 25:43 - Answer 2
    • 26:43 - Question - When do you know you have enough data unit tests?
    • 27:10 - Answer
    • 28:38 - Question - Is it a manual process to run the test reports?
    • 28:55 - Answer
    • 29:38 - Question - How do you handle string data?
    • 29:51 - Answer
    • 30:41 - Question - Have you had a situation where production data has changed and you need to update your expectations?
    • 30:55 - Answer
    • 32:18 - Question - How do you deal with fuzzy failures e.g. 1% of data is invalid?
    • 32:42 - Answer
  • 33:48 - Applause 👏

minimav avatar Sep 18 '22 08:09 minimav