mrjob
mrjob copied to clipboard
progress indicators are wrong when steps run simultaneously
_parse_progress_from_resource_manager()
assumes that there will be at most one job running on a cluster at the same time, which is wrong now that clusters can run steps concurrently.
If we know a step's StartTime
from the ListSteps
API, that seems to only be a few seconds off of Start Time
in the resource manager UI. So that's a way we could possibly match up step progress correctly.
It would be really nice if there EMR API would tell us the mapping between EMR step IDs and YARN application IDs, but so far I haven't found one.
Since we now have code to talk to the resource manager API, we can guess the application ID for the step from the apps
API (based on start time) and then get its progress from the app
API.