YouTubeVideoTimestamps
YouTubeVideoTimestamps copied to clipboard
Theodore Meynard - Test your data like you test your code | Pydata London 2022
I'm not sure whether hierarchical timestamps are supported but I've done some of them that way just in case it's useful.
Timestamps:
- 0:00 - Introduction
- 0:35 - Agenda
- 1:17 - About the speaker
- 1:49 - What are data unit tests?
- 3:37 - Data unit tests verify expectations
- 5:09 - Frameworks for data unit tests
- 8:16 - great_expectations
- 10:12 - Code Example - wines quality datasets
- 12:06 - Loading red wine dataset
- 12:34 - Check quality is an integer rating out of 10
- 13:00 - Check sulphate distribution is unchanged using KL Divergence
- 13:38 - Saving/loading an expectation suite
- 14:22 - Question - Which distribution is used in the sulphate distribution test?
- 14:33 - Answer
- 15:44 - Loading white wine dataset and running data unit tests on it
- 17:08 - Using data unit tests in practice at Get Your Guide
- 19:50 - Conclusion
- 21:15 Q & A
- 21:26 - Question - Could some of these problems be solved in conjunction with data versioning like DVC?
- 21:59 - Answer
- 23:00 - Question - How do you deal with validating ranges that differ based on values in another column?
- 23:30 - Answer
- 24:31 - Question 1 - Can the test report be exported to a file?
- 24:52 - Question 2 - Can great_expectations be used to compare 2 datasets?
- 25:09 - Answer 1
- 25:43 - Answer 2
- 26:43 - Question - When do you know you have enough data unit tests?
- 27:10 - Answer
- 28:38 - Question - Is it a manual process to run the test reports?
- 28:55 - Answer
- 29:38 - Question - How do you handle string data?
- 29:51 - Answer
- 30:41 - Question - Have you had a situation where production data has changed and you need to update your expectations?
- 30:55 - Answer
- 32:18 - Question - How do you deal with fuzzy failures e.g. 1% of data is invalid?
- 32:42 - Answer
- 33:48 - Applause 👏