Coyote Codornices Marin

Results 82 comments of Coyote Codornices Marin

mrjob runs `ssh` with `-o UserKnownHostsFile=/path/to/fake_known_hosts_file`, where `fake_known_hosts_file` is an initially empty file that mrjob controls. It sounds like your SSH binary is either ignoring the `UserKownHostsFile` option, or doesn't...

Oh, it looks like you're on a mac, so it wouldn't be a file path issue. Maybe more of a network issue? Looks like you're getting a connection timeout. Not...

mrjob is somewhat embarassingly still on boto 2. We actually have our own solution for making our boto connections super-robust against timeouts (see https://github.com/Yelp/mrjob/blob/master/mrjob/fs/s3.py#L49). When we do switch to boto...

Is this still an issue? mrjob is currently on botocore 1.6.0+, and it seems to be fairly robust about dealing with transient errors.

Thanks for letting me know. Will get in touch with folks at Google about the best way to handle this.

At least, we should be mocking out the call to `Popen()` rather than invoking a separate subprocess.

This is fairly complicated because we also need to mock out fork() and PTYs (though the Hadoop binary does the same thing in either case).

Huh, that's pretty much what mrjob is doing. If you run your batch with `-v`, it'll show you the `hadoop fs` command it's running. What happens if you `hadoop fs...

Also, you can't create more than one `SparkContext` at once, so actively managing the `SparkContext` would helpful for testing etc.