android-priority-jobqueue icon indicating copy to clipboard operation
android-priority-jobqueue copied to clipboard

IllegalStateException with job scheduler

Open tapasya opened this issue 9 years ago • 22 comments

Fatal Exception: java.lang.IllegalStateException: Apps may not schedule more than 100 distinct jobs at android.os.Parcel.readException(Parcel.java:1674) at android.os.Parcel.readException(Parcel.java:1619) at android.app.job.IJobScheduler$Stub$Proxy.schedule(IJobScheduler.java:158) at android.app.JobSchedulerImpl.schedule(JobSchedulerImpl.java:42) at com.birbit.android.jobqueue.scheduling.a.a(FrameworkScheduler.java:111) at com.birbit.android.jobqueue.w.a(JobManagerThread.java:190) at com.birbit.android.jobqueue.w.a(JobManagerThread.java:161) at com.birbit.android.jobqueue.w.a(JobManagerThread.java:35) at com.birbit.android.jobqueue.x.a(JobManagerThread.java:225) at com.birbit.android.jobqueue.messaging.g.a(PriorityMessageQueue.java:39) at com.birbit.android.jobqueue.w.run(JobManagerThread.java:220) at java.lang.Thread.run(Thread.java:761)

tapasya avatar Jul 06 '16 18:07 tapasya

Ugh, thats an annoying limitation. We can try counting schedule calls to avoid this but it seems really weird for an app to schedule 100 jobs. + JobManager actually batches these in 15 min intervals (by default unless you've disabled batching).

Can you provide more details on how you've ended up in this error? I wonder if this is some bug that caused JobManager to schedule too many JobScheduler jobs.

yigit avatar Jul 07 '16 07:07 yigit

I am using the job manager to queue up the impression tracking jobs in our app. Each view might be having multiple impression trackers so we have created independent jobs for every impression. I don't think there is bug with job manager, its our requirement of massive impression tracking which caused the job manager to reach the limit of JobScheduler. Surprisingly this is not documented any where. For the time being I have placed a work around and also added an option to use only GcmNetworkManager for scheduling the jobs.

We have not disabled batching.

tapasya avatar Jul 11 '16 14:07 tapasya

that still should work fine because even you create 1000 jobs with the same specs (delay & network requirement), JobManager will create only 1 new JobScheduler Job per app restart. It batches jobs with 15 minutes internals. Do you think despite that logic, it will hit the limit with your usage pattern ?

yigit avatar Jul 12 '16 05:07 yigit

What I have observed is JobManager is creating new job with JobScheduler for each job. Every job addition is going to the Scheduler which in my case is the FrameworkScheduler and its hitting the limit very soon.

tapasya avatar Jul 12 '16 17:07 tapasya

that seems wrong :(. Do you mind attaching a logger to the JobManager with verbose logging enabled and sharing them with me. I wonder why it is not batching them. Also can you share your JobManager configuration ? (btw, you are using beta-1 right ?)

yigit avatar Jul 12 '16 21:07 yigit

I am using alpha5 and the JobManager Configuration is same as the one in sample app. Do you want me to check with beta1 and then share the logs?

tapasya avatar Jul 13 '16 04:07 tapasya

There was a semi-related bugfix (#187). Can you check with beta1 and report logs. Thanks.

yigit avatar Jul 13 '16 06:07 yigit

Checked with beta1. Issue is still reproducible. Please refer to https://github.com/tapasya/hellojobqueue

tapasya avatar Jul 21 '16 10:07 tapasya

You are putting 20 min delay between jobs while BatchingScheduler uses a 15 min delay to batch jobs. I'll add a new constructor parameter to BatchingScheduler to make that time optional and then you can workaround this issue by creating a BatchingScheduler with a larger batching window.

yigit avatar Jul 21 '16 16:07 yigit

I have added the delay purposefully in the sample whereas there is no delay in the actual app in which I had encountered this issue. I ll try to see if I can get the exact flow which led to this exception over the weekend

tapasya avatar Jul 21 '16 16:07 tapasya

We just observed this crash in our app. We aren't sure what the cause was but 2 users saw this crash that both were on the Nexus 6P running Android 7.0. Has there been anything new associated with this issue?

JeffAtPropeller avatar Oct 31 '16 16:10 JeffAtPropeller

I have been unable to reproduce in a test build, but we are seeing more crash reports from Crashlytics with this particular issue. One thing to note is that ALL crashes are happening on Android 7.0. Has anyone else had any luck reproducing the issue or finding a workaround to prevent this from happening?

gyoda avatar Feb 14 '17 15:02 gyoda

I'm seeing more crashes everyday, any clue how to prevent this? image image

hussam789 avatar Mar 09 '17 13:03 hussam789

@yigit I managed to reproduce it:

com.birbit.android.jobqueue.config.Configuration.Builder builder = new com.birbit.android.jobqueue.config.Configuration.Builder(context)
                .minConsumerCount(1)//always keep at least one consumer alive
                .maxConsumerCount(3)//up to 3 consumers at a time
                .loadFactor(3)//3 jobs per consumer
                .consumerKeepAlive(120);

        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
            builder.scheduler(FrameworkJobSchedulerService.createSchedulerFor(context,
                    JobCollectionService.class), false); // disable batching for testing purposes
        } else {
            int enableGcm = GoogleApiAvailability.getInstance().isGooglePlayServicesAvailable(context);
            if (enableGcm == ConnectionResult.SUCCESS) {
                builder.scheduler(GcmJobSchedulerService.createSchedulerFor(context,
                        GcmJobService.class), false);
            }
        }
        jobManager = new JobManager(builder.build());
class LocationJob extends Job {
    static Location sLastSentLocation;
    public LocationJob() {
       super((new Params(1000)).setDelayMs(1000) // run every second for testing
                .setSingleId(trackableDataPoint.getDataPointType())
                .singleInstanceBy(trackableDataPoint.getDataPointType())
                .setPersistent(true) // keep alive if user closed the app
                .setRequiresNetwork(true) //  needed to keep run jobs when the app is closed
        );
    }

    public void onRun() {
         Location location = LocationServices.FusedLocationApi.getLastLocation(
                mGoogleApiClient);
         if (location != null && (sLastSentLocation == null || sLastSentLocation.distanceTo(location) > SOME_LIMIT) {
                yourAppApi.sendLocation(location);
                sLastSentLocation = location;
         }
         jobManager.addJob(new LocationJob()); // schedule next job
    }
}

If we have this kind of job which is Persistent and Requires Network, every time the job is called it invokes FrameworkScheduler:request() which calls : int scheduled = jobScheduler.schedule(builder.build()); (which creates a new job in JobScheduler until 100 jobs due to a restriction that was added in Android 7.0).

you can reset it only by calling scheduler.cancelAll();. i found one invocation in JobManagerThread:onIdle() that seems that it is never been called in this situation.

@Override
            public void onIdle() {
                JqLog.v("joq idle. running:? %s", running);
                if (!running) {
                    return;
                }
                if (!canScheduleConstraintChangeOnIdle) {
                    JqLog.v("skipping scheduling a new idle callback because looks like last one"
                            + " did not do anything");
                    return;
                }
                Long nextJobTimeNs = getNextWakeUpNs(true);
                // TODO check network should be another message which goes idle if network is the
                // same as now
                JqLog.d("Job queue idle. next job at: %s", nextJobTimeNs);
                if (nextJobTimeNs != null) {
                    ConstraintChangeMessage constraintMessage =
                            messageFactory.obtain(ConstraintChangeMessage.class);
                    constraintMessage.setForNextJob(true);
                    messageQueue.postAt(constraintMessage, nextJobTimeNs);
                } else if (scheduler != null) {
                    // if we have a scheduler but the queue is empty, just clean them all.
                    Log.d("onIdle", "*** should cancel all jobScheduler jobs");
                    if (shouldCancelAllScheduledWhenEmpty && persistentJobQueue.count() == 0) {
                        shouldCancelAllScheduledWhenEmpty = false;
                        scheduler.cancelAll();
                    }
                }
            }

hussam789 avatar Mar 14 '17 16:03 hussam789

I am also getting this crash.

Device: Nexus 6P OS: 7.1.1 Job Manager: 'com.birbit:android-priority-jobqueue:2.0.1'

My app is to schedule backup jobs for camera/gallery photos. I create each photo upload job as separate job. Users typically have more than 100 photos in their gallery.

Here is my Job construction and add job

    // class UploadJob extends Job
    public UploadJob(UploadParams uploadParams) {
        super(new Params(uploadParams.getPriority()).requireUnmeteredNetwork().persist()); 
    }

    // adding job to job manager
    jobManager.addJobInBackground(job);
 FATAL EXCEPTION: job-manager
	 Process: backgroundjob, PID: 1566
	 java.lang.IllegalStateException: Apps may not schedule more than 100 distinct jobs
		 at android.os.Parcel.readException(Parcel.java:1692)
		 at android.os.Parcel.readException(Parcel.java:1637)
		 at android.app.job.IJobScheduler$Stub$Proxy.schedule(IJobScheduler.java:158)
		 at android.app.JobSchedulerImpl.schedule(JobSchedulerImpl.java:42)
		 at com.birbit.android.jobqueue.scheduling.FrameworkScheduler.request(FrameworkScheduler.java:120)
		 at com.birbit.android.jobqueue.JobManagerThread.scheduleWakeUpFor(JobManagerThread.java:192)
		 at com.birbit.android.jobqueue.JobManagerThread.handleAddJob(JobManagerThread.java:163)
		 at com.birbit.android.jobqueue.JobManagerThread.access$100(JobManagerThread.java:35)
		 at com.birbit.android.jobqueue.JobManagerThread$1.handleMessage(JobManagerThread.java:228)
		 at com.birbit.android.jobqueue.messaging.PriorityMessageQueue.consume(PriorityMessageQueue.java:39)
		 at com.birbit.android.jobqueue.JobManagerThread.run(JobManagerThread.java:222)
		 at java.lang.Thread.run(Thread.java:761)

and here is my job manager

            Configuration.Builder builder = new Configuration.Builder(context)
                    .minConsumerCount(1)//always keep at least one consumer alive
                    .maxConsumerCount(3)// Max 3 threads
                    .networkUtil(mJobqueueNetworkUtilImpl)
                    .consumerKeepAlive(120);//wait 2 minute
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
                builder.scheduler(FrameworkJobSchedulerService.createSchedulerFor(context,
                        JobSchedulerService.class), false);
            } else {
                int enableGcm = GoogleApiAvailability.getInstance().isGooglePlayServicesAvailable(context);
                if (enableGcm == ConnectionResult.SUCCESS) {
                    builder.scheduler(GcmJobSchedulerService.createSchedulerFor(context,
                            GcmJobService.class), false);
                } else {
                    throw new UnsupportedOperationException("Required Google Play version is not installed on the phone..");
                }
            }
            return new JobManager(builder.build());

Please note that i set batch= false because in some phone(motorola pure edition), when wifi disconnect and come back, job scheduler didn't start immediately. changing batch= false solved that issue.

cipson avatar Mar 16 '17 21:03 cipson

You must use batch=true. Otherwise, JobManager will need to schedule a framework job for each request and you'll hit this problem, nothing much we can do about it. We could catch the exception but catching it will simply hide a problem.

yigit avatar Mar 18 '17 19:03 yigit

We are also seeing this crash in production, and trying to find a fix or a workaround.

@yigit the bug has been reproduced by @hussam789 (see above). Was that helpful? Will there be a fix?

franklinharper avatar Apr 02 '17 00:04 franklinharper

@franklinharper the problem is that, if you set batching to false and create 100 jobs, we will have to call job scheduler 100 times :/. not sure how we can work around this problem. Any suggestions besides catching the error?

yigit avatar Apr 02 '17 02:04 yigit

@yigit @franklinharper we could reproduce the issue also with batching set to true, it just take much more time for it to accumulate 100 jobs (in JobScheduler).

when you have jobs that are Persistent , Requires Network and a recurring job (when you create a new job in the onRun() method.

i think because there are no way to mark a job (Jobqueue's job) as a recurring job using the Library, technically the Library in this scenario doesn't cancel JobScheduler's Jobs because all the time we have at least one Job (Jobqueue job) in the queue as seen in this code snippet JobManagerThread:onIdle():

 public void onIdle() {
      .....

      Log.d("onIdle", "*** should cancel all jobScheduler jobs");
      if (shouldCancelAllScheduledWhenEmpty && persistentJobQueue.count() == 0) {
           shouldCancelAllScheduledWhenEmpty = false;
           scheduler.cancelAll();
       }
}

one workaround that i did: is adding an if-statement to FrameworkSceduler:request() that calls cancelAll() if JobScehduler's Pending job

        if (jobScheduler.getAllPendingJobs().size() > 80) {
            jobScheduler.cancelAll();
        }

        int scheduled = jobScheduler.schedule(builder.build());
        JqLog.d("[FW Scheduler] scheduled a framework job. Success? %s id: %d" +
                " created id: %d", scheduled > 0, scheduled, id);

Suggested Long-Term solution: Currently the Library doesn't support Recurring jobs, Maybe adding Recurring Jobs Feature as a part of the Library and not through the hack that we do today (making each job create the next job to the queue). maybe then you can cancel all the JobScheduler's Jobs of already finished JobQueue jobs

hussam789 avatar Apr 03 '17 08:04 hussam789

Any new information regarding this issue? We also started seeing it in production users...

roeeg avatar Nov 08 '17 09:11 roeeg

I'm currently investigating this as well.

One thing I noticed is that if you have multiple JobManagers, you may want to make sure you are sharing a single BatchingScheduler instance between them.

bryanstern avatar May 22 '18 00:05 bryanstern

In prod for us as well. Are there some news for this?

rareserban avatar Jan 28 '22 10:01 rareserban