Brian Wylie
Brian Wylie
Make a Blog about doing a Udacity Class project using SageWorks - Super Fast - Easy - Step by Step Guide
The approach of using CodeCommit + GitHub + AWS remote per ENV is a least mildly interesting.. so perhaps write up a blog page about it.
Approach: - Glue Job to DataSource (DataLoader Glue class (heavy) to be implemented) - Aletha CTAS Query to FeatureSet (DataSource_to_FeatureSet class (heavy) to be implemented) - Use Jaccard Distance on...
Been a while since we looked at DGA, perhaps take another peek - https://notebook.community/yevheniyc/Python/1m_ML_Security/notebooks/day_3/Worksheet%206%20-%20DGA%20Detection%20ML%20Classification
https://www.quora.com/What-are-some-popular-datasets-to-measure-Spark-performance
We should explore some testing using AWS LocalStack - https://awstip.com/run-aws-on-your-laptop-introduction-to-localstack-7269c19dedae
Having the KNN Spider use the metadata class will mean that we can quickly compute **distances** between: - Data Sources (Jaccard Distance on Column Names) - Feature Sets (Jaccard Distance...
Have a 10 line script that shows going from a new csv file to quicksight in just a few lines of code. - CSV gets dropped into the `sageworks-incoming-data` bucket/prefix...
Run the abalone CSV file through AWS Canvas, keep notes about what worked well and what didn't. Compare and contract Canvas to SageWorks.
A Class that takes in two Feature Set objects and compares them in cool ways.