Brian Wylie
Brian Wylie
FAISS (facebook research): https://github.com/facebookresearch/faiss Blog about Sim Search: https://towardsdatascience.com/similarity-search-knn-inverted-file-index-7cab80cc0e79
https://xgboost.readthedocs.io/en/stable/tutorials/categorical.html
Version 1.3.dev0 of scikit-learn has a new set_output API that lets you pipeline objects with dataframe outputs/inputs https://scikit-learn.org/dev/auto_examples/miscellaneous/plot_set_output.html#sphx-glr-auto-examples-miscellaneous-plot-set-output-py
We need to think about what domain specific functionality is needed when creating/generating feature sets. The obvious one is that each domain will need a separate set of Python classes...
https://github.com/YingfanWang/PaCMAP/blob/master/demo/basic_demo.py
https://dashaggrid.pythonanywhere.com/ Standard Dash Table: https://dash.plotly.com/datatable Dash AG Table Comparison: https://youtu.be/dovf4FwtwPg?t=1862 Dash AG Table Info/Install: https://dash.plotly.com/dash-ag-grid
Lets take a deeper dive on Athena Views and see how/where they fit into SageWorks. https://docs.aws.amazon.com/athena/latest/ug/views.html
We should optimize the SQL query for Value Counts in the same way we did for column_stats()
We have the AthenaSource, so we need to make an RDSSource that hits an AWS RDS database.
Might be fun to think about a Zeek Application Prototype that uses SageWorks to quickly build an application that uses AWS ML Services (via SageWorks).