db-scheduler icon indicating copy to clipboard operation
db-scheduler copied to clipboard

How many scheduled task were used for the benchmark?

Open gianielsevier opened this issue 3 years ago • 23 comments

Hi there,

I'm looking for an alternative for quartz and I think your solution can be the one. Today we use quartz a lot and can have over 14 million triggers in our DB. Quartz is not behaving well under this number and adding more instances to the cluster don't bring any benefit, the triggers are delaying a lot.

I would like to know what would be the limit of the db-scheduler and if we can add more instances to scale the growing number of scheduled tasks?

gianielsevier avatar Jun 02 '21 15:06 gianielsevier

Hi!

Could you describe a bit more what type of tasks you have? 14 million recurring tasks? How often are they running?

For the benchmark I created synthetic executions scheduled to run now(), maybe 2 million each time. But I don't think the amount of executions in the table should affect the performance that much, as long as it is indexed properly. What kind of throughput do you require (executions/s) and what database are you using?

kagkarlsson avatar Jun 02 '21 17:06 kagkarlsson

Scaling depends a bit on the tasks as well. Up to the point where the database becomes the bottleneck you can increase throughput by adding instances. If the task does nothing database-related, tests indicate you should be able to reach 10k executions/s.

kagkarlsson avatar Jun 02 '21 18:06 kagkarlsson

Hi, @kagkarlsson sorry for my delayed reply.

Let me explain our use case.

We have different clients that can come to our application and create/update/delete a trigger to run any time. Our clients are different websites with millions of users interested in receiving recurrent information and for that, they use our system to save it. All of our triggers are dynamically created and we can have thousands running at the same time every second.

The number of triggers is just growing and growing.

Please let I know if you have any other question.

gianielsevier avatar Jun 07 '21 11:06 gianielsevier

The limiting factor will be the number of triggers running to completion / second. If these triggers/tasks take say 10s to run, and there are 1000 running in parallell, that will approximately be 100 completions/second (also referred to as executions/s).

If you have long-running tasks like that, you will likely first be limited by the size of the thread-pool. That can be increased both per instance (configurable) and by adding more instances.

If you reach a point where you need to run more than say 10.000 completions/s, you might need to use multiple databases and split the triggers/executions between them (i.e. sharding).

How long does a typical trigger / execution / task run?

create/update/delete a trigger to run any time

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule?

Our clients are different websites with millions of users interested in receiving recurrent information and for that, they use our system to save it

Is it one trigger created per user?

kagkarlsson avatar Jun 08 '21 07:06 kagkarlsson

How long does a typical trigger / execution / task run? It should take a maximum of 1 second

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule? They are always recurring tasks

Is it one trigger created per user? It can be one or more per user

gianielsevier avatar Jun 08 '21 20:06 gianielsevier

Is this one-time tasks or recurring on a schedule? If recurring, what is typically the schedule? They are always recurring tasks

What is the schedule? Are they evenly spread in time, or are there peaks?

I still feel that I don't have the complete picture here. Currently, at what threshold of executions/s are you starting to experience problems? And how far are you hoping to push that using db-scheduler? Keep in mind that the key-metric here is executions/s.

kagkarlsson avatar Jun 11 '21 08:06 kagkarlsson

Hi @kagkarlsson I've started the POC and I have a question. I'm trying to use the spring version with tasks created dynamically based on requests coming from a controller. The tasks are being persisted to the database but the the column task_data is always null. I'm also confused on how to handle the trigger when it's time to run it.

I've tried to follow the examples from here: https://github.com/kagkarlsson/db-scheduler/blob/master/examples/features/src/main/java/com/github/kagkarlsson/examples/PersistentDynamicScheduleMain.java

This is the code I'm using to create the task:

Note.: scheduler is


@Service
public class SchedulerService {

    private final ExecutionRunner executionRunner;

    private final CronTriggerBuilder cronTriggerBuilder;

    private final Scheduler scheduler;

    public SchedulerService(
                            final ExecutionRunner executionRunner,
                            final CronTriggerBuilder cronTriggerBuilder,
                            final Scheduler scheduler) {
        this.dataSource = dataSource;
        this.executionRunner = executionRunner;
        this.cronTriggerBuilder = cronTriggerBuilder;
        this.scheduler = scheduler;
    }

    public void create(final DummyPojo pojo) {

        String idOne = pojo.getIdOne();
        String idTwo = pojo.getIdTwo();

        SerializableSchedule serializableSchedule = new SerializableSchedule(idOne, idTwo, cronTriggerBuilder.build(pojo));

        RecurringTask<SerializableSchedule> task = Tasks.recurring(UUID.randomUUID().toString(), serializableSchedule, SerializableSchedule.class)
                .execute(executionRunner);

        Instant newNextExecutionTime = serializableSchedule.getNextExecutionTime(ExecutionComplete.simulatedSuccess(Instant.now()));
        
        TaskInstance<SerializableSchedule> instance = task.instance(idOne);

        scheduler.schedule(instance, newNextExecutionTime);

    }

}

This is the execution runner class:

@Component
public class ExecutionRunner implements VoidExecutionHandler<SerializableSchedule> {

    private final SQSService sqsService;

    public ExecutionRunner(final RotsSqsWorker rotsSqsWorker) {
        this.rotsSqsWorker = rotsSqsWorker;
    }

    @Override
    public void execute(final TaskInstance<SerializableSchedule> taskInstance, final ExecutionContext executionContext) {

        SerializableSchedule serializableSchedule = taskInstance.getData();

        if (serializableSchedule != null) {

            long scheduledTimeEpochSeconds = executionContext.getExecution().executionTime.toEpochMilli();

            SQSMessage message = new SQSMessage();
            message.setIdOne(serializableSchedule.getIdOne());
            message.setIdTwo(serializableSchedule.getIdTwo());
            message.setRandomId(UUID.randomUUID().toString());
            message.setScheduledTimeEpochSeconds(scheduledTimeEpochSeconds);

            sqsService.send(message);
        }

    }
}

This is the SerializableSchedule class:

public class SerializableSchedule implements Serializable, Schedule {

    private final String idOne;

    private final String idTwo;

    private final String cronPattern;

    public SerializableSchedule(final String idOne, final String idTwo, final String cronPattern) {
        this.idOne = idOne;
        this.idTwo = idTwo;
        this.cronPattern = cronPattern;
    }

    @Override
    public Instant getNextExecutionTime(ExecutionComplete executionComplete) {
        return new CronSchedule(cronPattern).getNextExecutionTime(executionComplete);
    }

    @Override
    public boolean isDeterministic() {
        return true;
    }

    public String getIdOne() {
        return idOne;
    }

    public String getIdTwo() {
        return idTwo;
    }

    public String getCronPattern() {
        return cronPattern;
    }

    @Override
    public String toString() {
        return "SerializableCronSchedule pattern=" + cronPattern;
    }
}

gianielsevier avatar Jun 23 '21 08:06 gianielsevier

RecurringTask<SerializableSchedule> task = Tasks.recurring(UUID.randomUUID().toString(), serializableSchedule, SerializableSchedule.class)
                .execute(executionRunner);

You only do this once, at scheduler construction and startup. Inject a reference to the task and inject that in SchedulerService and create instances from that. You probably also want to use a CustomTask and disable the scheduleOnStartup(...). The RecurringTask will get automatically added when the scheduler starts if it does not exist

kagkarlsson avatar Jun 24 '21 06:06 kagkarlsson

I have gotten a couple of other questions along these lines which has made it clear I need a better Spring Boot example for tasks with dynamic schedule that are added at runtime

kagkarlsson avatar Jun 24 '21 06:06 kagkarlsson

Also, for more robust serialization, you may want to consider setting a custom JsonSerializer. (also something I need to add an example for)

kagkarlsson avatar Jun 24 '21 06:06 kagkarlsson

This is just setting up the implementation, I see that execute(..) is not the best choice of method-name, should maybe call it onExecute(...)

        final CustomTask<SerializableCronSchedule> task = Tasks.custom("dynamic-recurring-task", SerializableCronSchedule.class)
            .scheduleOnStartup(RecurringTask.INSTANCE, initialSchedule, initialSchedule)
            .onFailure((executionComplete, executionOperations) -> {
                final SerializableCronSchedule persistedSchedule = (SerializableCronSchedule) (executionComplete.getExecution().taskInstance.getData());
                executionOperations.reschedule(executionComplete, persistedSchedule.getNextExecutionTime(executionComplete));
            })
            .execute((taskInstance, executionContext) -> {
                final SerializableCronSchedule persistentSchedule = taskInstance.getData();
                System.out.println("Ran using persistent schedule: " + persistentSchedule.getCronPattern());

                return (executionComplete, executionOperations) -> {
                    executionOperations.reschedule(
                        executionComplete,
                        persistentSchedule.getNextExecutionTime(executionComplete)
                    );
                };
            });
            ```

kagkarlsson avatar Jun 24 '21 06:06 kagkarlsson

Hey, @kagkarlsson many thanks for your help. 🙌 Now it is working as expected. We will prepare the tests and I'll give you an update.

gianielsevier avatar Jun 29 '21 09:06 gianielsevier

Np. Will be interesting to hear the results. Sounded like a very-high-throughput use-case

kagkarlsson avatar Jun 29 '21 11:06 kagkarlsson

Hi @kagkarlsson,

Finally, I've managed to have time and come back with results:

The POC numbers: We've created 14 million custom recurrent tasks. The tasks were created to run like this 2 million per day of week distribute among the 24 hours of the day. We were running the application on K8S and for that, we dedicated 4 pods with 500MB of memory and 0.5 core CPU. The database was Postgres DB was an AWS db.m6g.large which has 8GB of memory and 2 vCPU, this instance also handles other applications mainly with Quartz (this is our nonprod environment).

Application behaviour: Saving the tasks: We had an endpoint where the client can send a payload asking to save a scheduler (task) giving a day of the week and what time it should run (they are always recurrent)

Running the tasks: Once it is time to run the tasks the APP was being triggered by DB Scheduler lib collecting the information about the task and sending a message to AWS SQS.

The aim of this POC was to check if db-scheduler would be able to handle millions of schedulers(tasks) without delaying the execution of them (the main issue we have with Quartz today). We also wanted to make sure that db-schduler would be able to scale horizontally without looking at the db and causing delays To check the delay we were basically getting the current time - the task execution time and logging it. From our logs, we are also printing which pod did the job.

After making few changes on the configs below: db-scheduler.threads db-scheduler.polling-strategy-lower-limit-fraction-of-threads db-scheduler.polling-strategy-upper-limit-fraction-of-threads

Also checking the number of pods to handle the 14 million tasks saved in our DB we've managed to not have delays.

We kept the POC running for a month and checking our logs it was clear that db-scheduler was able to run with multiple pods distributing equally the load among them and also no delays.

We will start a new project soon to provide a scalable scheduler solution for our company and db-scheduler is the way to go.

Many thanks for your support @kagkarlsson and also for building this incredible solution.

gianielsevier avatar Feb 01 '22 10:02 gianielsevier

Good to hear! And just to let you know, working on an improvement to your use-case, many instances of the same recurring-tasks with variable schedule: #257

kagkarlsson avatar Feb 01 '22 14:02 kagkarlsson

@kagkarlsson that's great, thanks for the feedback. I was wondering if I could contribute to your repo by providing an example similar to the POC we did?

gianielsevier avatar Feb 01 '22 15:02 gianielsevier

Improved api released in 11.0.

I was wondering if I could contribute to your repo by providing an example similar to the POC we did?

I missed your comment here, sorry. If you have such code that you think might be valuable for people to see, how about pushing it to your own github-repo, and I can link from the README ? I can also add a link to this issue where you are describing your setup.

Also, if you are happy users, you are welcome to add your company to the list here: https://github.com/kagkarlsson/db-scheduler#who-uses-db-scheduler :)

kagkarlsson avatar Feb 23 '22 12:02 kagkarlsson

I followed this guide and also to create the the schedule by this way. but i can't cancel this task in my spring boot project.

who can help me?

@PostMapping(path = "stop", headers = {"Content-type=application/json"}) public void stop(@RequestBody StartRequest request) {

    final TaskInstanceId scheduledExecution = TaskInstanceId.of("dynamic-recurring-task", RecurringTask.INSTANCE);
    if(!Objects.isNull(scheduledExecution)) {
        System.out.println("TaskID:" + scheduledExecution.getId());
        schedulerClient.cancel(scheduledExecution);
    }
}

huynhnt avatar Apr 11 '23 08:04 huynhnt

Hi @kagkarlsson,

Finally, I've managed to have time and come back with results:

The POC numbers: We've created 14 million custom recurrent tasks. The tasks were created to run like this 2 million per day of week distribute among the 24 hours of the day. We were running the application on K8S and for that, we dedicated 4 pods with 500MB of memory and 0.5 core CPU. The database was Postgres DB was an AWS db.m6g.large which has 8GB of memory and 2 vCPU, this instance also handles other applications mainly with Quartz (this is our nonprod environment).

Application behaviour: Saving the tasks: We had an endpoint where the client can send a payload asking to save a scheduler (task) giving a day of the week and what time it should run (they are always recurrent)

Running the tasks: Once it is time to run the tasks the APP was being triggered by DB Scheduler lib collecting the information about the task and sending a message to AWS SQS.

The aim of this POC was to check if db-scheduler would be able to handle millions of schedulers(tasks) without delaying the execution of them (the main issue we have with Quartz today). We also wanted to make sure that db-schduler would be able to scale horizontally without looking at the db and causing delays To check the delay we were basically getting the current time - the task execution time and logging it. From our logs, we are also printing which pod did the job.

After making few changes on the configs below: db-scheduler.threads db-scheduler.polling-strategy-lower-limit-fraction-of-threads db-scheduler.polling-strategy-upper-limit-fraction-of-threads

Also checking the number of pods to handle the 14 million tasks saved in our DB we've managed to not have delays.

We kept the POC running for a month and checking our logs it was clear that db-scheduler was able to run with multiple pods distributing equally the load among them and also no delays.

We will start a new project soon to provide a scalable scheduler solution for our company and db-scheduler is the way to go.

Many thanks for your support @kagkarlsson and also for building this incredible solution.

@gianielsevier

Thanks for providing detailed explanation about you poc . We also have a similar use case . Is it possible for you to share the example code which you have used in your POC ?

Thanks in advance !

nj2208 avatar Nov 18 '23 05:11 nj2208

@kagkarlsson Could you please share which example we can follow for similar use case to achieve very high throughput in case of short running jobs which just post message on message broker ?

nj2208 avatar Nov 21 '23 16:11 nj2208

I think you will get the best throughput using PostgreSQL and .pollUsingLockAndFetch(1.0, 4.0) (thresholds tunable). Possibly increase the number of threads using .threads(xx) (or their spring-boot starter counterparts)

kagkarlsson avatar Nov 22 '23 06:11 kagkarlsson

I think you will get the best throughput using PostgreSQL and .pollUsingLockAndFetch(1.0, 4.0) (thresholds tunable). Possibly increase the number of threads using .threads(xx) (or their spring-boot starter counterparts)

Thanks a lot . Will use these settings in our PoC .

nj2208 avatar Nov 23 '23 12:11 nj2208

Also make sure you have the necessary indices

kagkarlsson avatar Nov 23 '23 15:11 kagkarlsson