reference-apps
reference-apps copied to clipboard
Spark reference applications
Aim - API call from any machine that submits a Spark job to Spark EC2 cluster Job runs perfectly well - Python file running on Localhost- Apache Spark However, unable...
Hi, I have successfully compiled the Twitter classifier sample and I am trying to run the first program to collect the tweets. When I run the example I am running...
I think unit testing approaches should be added to this reference-apps it would promote quality among spark beginners.
The last item, content size, could be 0, represented either as '0' or '-' . For the log from http://www.monitorware.com/en/logsamples/apache.php still there is one line that does not conform. To...
The following code can be improved by better leveraging SQL: ``` // Calculate statistics based on the content size. Tuple4 contentSizeStats = sqlContext.sql("SELECT SUM(contentSize), COUNT(*), MIN(contentSize), MAX(contentSize) FROM logs") .map(row...
The following text contains a small typo: > Again, call cache on the conte**x**t size RDD... The word _context_ should be replaced with _content_. Also, the next section of the...
I am new to Scala and Spark. It may not be an issue and may be I don't know how to use it. I am trying to build "twitter_classifier" project....
Hello, Thanks for this tutorial! I was wondering if it could be possible to compute the session metric like Google Analytics does using Apache SPARK in offline mode? https://support.google.com/analytics/answer/2731565?hl=en The...
In twitter classifier ExamineAndTrain, the sql query to count users based on lang returns 0
i am able to run all chapter applications but when i tried to run app of loganalyzer i am not able to run properly. somehow i solved all errors and...