YouTubeVideoTimestamps
YouTubeVideoTimestamps copied to clipboard
PyData Ann Arbor: Leland McInnes | PCA, t-SNE, and UMAP: Modern Approaches to Dimension Reduction
Video URL: https://www.youtube.com/watch?v=YPJQydzTLwQ
Contents
0:00 Acknowledgements and Sponsors
0:37 Housekeeping Items for In-Person Attendees
1:41 PyData Code of Conduct
2:41 Icebreaker for In-Person Attendees
4:20 Announcement: This Month in Data Science: DataFramed Podcast
5:02 Announcement: Open Jobs with Event Co-Sponsor
5:21 Announcement: New Meetup, PyData Toronto
5:34 Announcement: Next Event
5:55 Speaker Introduction by Host
6:17 Talk Introduction: PCA, t-SNE, UMAP
6:41 Speaker Introduction by Speaker
7:08 What is Dimension Reduction?
7:47 MNIST-Based Example
8:27 Fashion-MNIST Example
8:57 How do you do Dimension Reduction?
9:12 Technique 1: Matrix Factorization
9:39 Technique 2: Neighbor Graphs
10:02 Principal Component Analysis (PCA): Introduction
11:10 PCA: Algorithmic Underpinnings
12:27 PCA: Toy Input Data, Transformation, and Scatterplot
14:33 PCA: MNIST Digits Data Scatterplot
15:37 PCA: Fashion-MNIST Data Scatterplot
15:52 t-Distributed Stochastic Neighbor Embedding (t-SNE): Introduction
16:57 t-SNE: Algorithmic Underpinnings
23:10 t-SNE: MNIST Digits Data Scatterplot
23:42 t-SNE: Fashion-MNIST Data Scatterplot
24:01 Uniform Manifold Approximation and Projection (UMAP): Introduction
25:11 UMAP: Algorithmic Underpinnings (Topological Data Analysis and Simplicial Complexes)
27:35 UMAP: Toy Input Data, Transformation, and Scatterplot
28:47 UMAP Caveat: UMAP Needs Uniform Distribution of Data
29:46 UMAP: Define a Riemannian Metric on the Manifold to Conform to Uniform Distribution
30:06 UMAP: Brief Primer on Manifold Theory
31:50 UMAP: Fuzzy Cover Concept
33:31 UMAP Assumption: Manifold is Locally Connected
34:36 UMAP Distribution of Distances for 20 Nearest Neighbors
35:45 UMAP Local Metrics are Incompatible
37:52 UMAP: Toy Input Data, Transformation, and Graph
40:10 UMAP: MNIST Digits Data Scatterplot
40:41 t-SNE: Fashion-MNIST Data Scatterplot
41:03 UMAP: Implementation and Constraints
41:58 Use of Numba Library
43:27 UMAP is Faster than t-SNE on 4 Datasets
44:24 Additional UMAP Use Cases
45:20 UMAP Can Use Labels for Supervised Dimension Reduction
46:28 UMAP Can Leverage Metric Learning
47:30 UMAP Scales Well to Many Different Labels, Distances, and Data Types
48:16 UMAP Can Work with Pandas DataFrames
48:38 Wrap-Up
48:44 Conclusion 1: PCA is Interpretable Dimension Reduction
49:07 Conclusion 2: t-SNE Works Great
49:11 Conclusion 3: UMAP Improved on t-SNE by Being Theoretically Grounded
49:30 UMAP Github Resource, Conda and Pip Packages
49:44 Q&A 1 — How do you assume uniform distribution and construct the manifold from that distribution?
50:32 Q&A 2 — In your MNIST example, does the positional information matter in dimensional reduction like it does in conv-nets?
51:48 Q&A 3 — Can you start from latent space and find corresponding data?
52:14 Q&A 4 — What is an example of how to measure distance within UMAP?
53:11 Q&A 5 — How should we interpret position along the axes of the UMAP scatterplots?
54:06 Q&A 6 — How do you figure out what features were important for the cluster classification within UMAP?
55:07 Q&A 7 — What kind of distance metric is suitable for a categorical label?
55:42 Q&A 8 — Could you provide more context on the appropriate feature space for HDBSCAN and UMAP?
57:10 Q&A 9 — Have you tried UMAP on autocorrelated data?
57:42 Thank you!