appengine-mapreduce
appengine-mapreduce copied to clipboard
Debugging over multiple instances in a module
I have a module dedicated to running appengine-mapreduce because it does background things like analytics, and it made sense to decouple it from the front end instances. Currently, the config is: application: cosightio version: 1 runtime: python27 api_version: 1 threadsafe: no module: mapred instance_class: B8 manual_scaling: instances: 5
My question is whether this is the best way to scale the map reduce jobs. It goes over each entity of a DataStore kind and puts it into an appropriate Document Index. My queue config is: name: mapreduce-queue rate: 200/s max_concurrent_requests: 200
The current job takes 1 hr to run, and I don't think that it is distributing it over the 5 machines. Won't all machines pick up the jobs from the taskqueue if they are free? Also, it seems better to have the scaling such that it spins up max number of instances and spins them back down after its done. And how can I find out whether I am bottlenecked on the processing or the I/O (I think its I/O since the document index can only do about 15k reads/writes per min, which is why I tuned down my queue config to do only 200 req/sec [15000/60=250]? Would basic_scaling be better for that?
You can manually scale if you like, it is obviously predictable. However if you do make sure you set your number of shards to be significantly more than your number of instances so that work is well spread and execution times are staggered.