amazon-codeguru-profiler-for-spark
amazon-codeguru-profiler-for-spark copied to clipboard
Onboarding difficulties
Hi, I've spent a few hours going through the first use of the CodeGuru profiler and thought I'd report the issues I faced if that can allow the documentation to be improved.
I'm only occasionally using EMR, so those might not all be issues that most users would go through.
The official documentation doesn't mention Spark.
I started from the blog post, but the amount of outgoing links make me focus on the official CodeGuru documentation and "Setup Instructions" page of the new profiling group. I never used CodeGuru before and all those other references made me believe that I should use the Java software.amazon.codeguruprofilerjavaagent.Profiler. It actually worked but obviously only installed the agent on the driver. So some of those pages could also mention Spark having its special plugin too.
I didn't notice that the yarn-env.export JSON way of specifying PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER also required setting spark.plugins
It's somewhat mentioned that "an alternative way to specify PROFILING_CONTEXT and ENABLE_AMAZON_PROFILER" doesn't include setting the plugin, and it's also listed in Prerequisites, but highlighting this nearer to the JSON approach paragraph could be easier to spot. To be honest I didn't read that page very carefully given the amount of text encountered in the whole process. At first it also wasn't clear to me whether those environment variables were read by something inside Spark, or something around it.
It's not clear that the plugin must be included in the fat JAR
And the README doesn't mention that it's available on Maven Central, only the blog post mentions it.
That became obvious pretty quickly, but a quick note and a link to a pom.xml snippet could have spared me an iteration.
The JSON property names casing seems wrong in the README and in the blog post
At least the aws CLI reported me this error and I had to capitalize the property names.
Parameter validation failed:
Unknown parameter in Configurations[0]: "classification", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "properties", must be one of: Classification, Configurations, Properties
Unknown parameter in Configurations[0]: "configurations", must be one of: Classification, Configurations, Properties
The yarn-env.export classification didn't work for me (emr-6.13.0)
This was the most tedious issue.
I connected to the workers through SSH and found that the environment variables were properly set in /etc/hadoop/conf.empty/yarn-env.sh, but somehow those environment variables didn't seem to reach the plugin in my Spark worker process and it didn't report "Profiling is enabled".
After ending up on that page, I tried this and it finally worked 🎉:
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.executorEnv.PROFILING_CONTEXT": "{\\\"profilingGroupName\\\":\\\"CodeGuru-Spark-Demo\\\"}",
"spark.executorEnv.ENABLE_AMAZON_PROFILER": "true"
}
}
]
The profiler has already been useful for us and the flamegraph is actually quite nice, so thank you for putting this in place.