Apache-Spark-Hands-On
Apache-Spark-Hands-On copied to clipboard
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
For the benefit of community, Please feel free to add/request anything that hasnt been covered. Please remember this is beginners guide and not a expert level documentation.
Hadoop
-
/Flume
: contains notes and examples of apache flume -
/Hive
: contains notes and examples of apache hive -
/MySQL
: code sample containing peices to create db, create table and load data in mysql -
/Sqoop
: contains notes and examples of import/export using sqoop -
/spark
: contains notes,documentation, sample example(s) of spark APIs
Hands-on :
-
/exam
: sample cca-175 exam questions and solutions (in solution branch) -
/problem1
- complex data structure handling using hive. (exposure to Hive,create table,LOAD,named_struct,struct) -
/problem2
- Stock data analysis. (exposure to : json file handing, SparkSQL,map,reduce,filter,join,groupByKey,keyBy,UDFs etc) -
/problem3
- MovieLens database analysis -
/problem4
- Lahman's baseball database analysis -
/problem5
- Hortonworks certification sample. Total 10 tasks . -
/Tweeter
- Tweeter data analysis -
/problem6
- Retail database sample excercises