appengine-mapreduce icon indicating copy to clipboard operation
appengine-mapreduce copied to clipboard

Debugging over multiple instances in a module

Open debnathsinha opened this issue 10 years ago • 1 comments

I have a module dedicated to running appengine-mapreduce because it does background things like analytics, and it made sense to decouple it from the front end instances. Currently, the config is: application: cosightio version: 1 runtime: python27 api_version: 1 threadsafe: no module: mapred instance_class: B8 manual_scaling: instances: 5

My question is whether this is the best way to scale the map reduce jobs. It goes over each entity of a DataStore kind and puts it into an appropriate Document Index. My queue config is: name: mapreduce-queue rate: 200/s max_concurrent_requests: 200

The current job takes 1 hr to run, and I don't think that it is distributing it over the 5 machines. Won't all machines pick up the jobs from the taskqueue if they are free? Also, it seems better to have the scaling such that it spins up max number of instances and spins them back down after its done. And how can I find out whether I am bottlenecked on the processing or the I/O (I think its I/O since the document index can only do about 15k reads/writes per min, which is why I tuned down my queue config to do only 200 req/sec [15000/60=250]? Would basic_scaling be better for that?

debnathsinha avatar Dec 12 '14 15:12 debnathsinha

You can manually scale if you like, it is obviously predictable. However if you do make sure you set your number of shards to be significantly more than your number of instances so that work is well spread and execution times are staggered.

tkaitchuck avatar Oct 16 '15 21:10 tkaitchuck