Memory leak when using FT Fallback with dependent beans
Describe the bug
Not exactly sure if its with dependent beans per se, but we saw this in our application using quartz job. The reproducer uses something a bit similar using quartz as well.
When we create a job that uses or a fallback method, or a FallbackHandler class, the Job remains as an outgoing ref in io.quarkus.arc.impl.EagerInstanceHandle
So by running the reproducer we create 1000 jobs that runs instantly (should not exists after) here's the result
WITH FALLBACK
WITHOUT FALLBACK (comment line 18 of MyJob.java in the reproducer)
Expected behavior
Memory should not climb up without getting down each time a job run. GC should be able to clean it up
Actual behavior
Memory remains taken forever
How to Reproduce?
Reproducer: https://github.com/manofthepeace/ft-leak
Steps to reproduce:
1- mvn quarkus:dev (wait a couple sec until the 1000 job runs)
2- get a heapdump via jps+ jmap -dump:live,file=/Users/alex/Desktop/dumps/dump.hprof <pid>
3- analyze EagerInstanceHandle and MyJob.*
4- comment out line 18 (the fallback annotation) from MyJob.java 5- repeat step 1,2,3
Output of uname -a or ver
Darwin Kernel Version 21.6.0
Output of java -version
Temurin-21.0.2+13
Quarkus version or git rev
3.7.3
Build tool (ie. output of mvnw --version or gradlew --version)
apache-maven-3.9.5
Additional information
Got around the leak by changing the fallback to a CDI decorator
Got around the leak by changing the fallback to a CDI decorator
@manofthepeace what do you mean by this? How does a decorator workaround this?
Also, did you look who holds the reference to the io.quarkus.arc.impl.EagerInstanceHandle? The handle is likely just a mean to perform programmatic lookup which can, under certain cirmustances, lead to a leak with dependent beans.
/cc @Ladicek (fault-tolerance), @machi1990 (quartz), @mkouba (quartz,scheduler)
I'll assign to myself, unless someone beats me to it, I will take a closer look tomorrow.
@manofthepeace what do you mean by this? How does a decorator workaround this?
In my particular case I could replace the FallbackHandler class by a decorator to achieve the behaviour I wanted for my application. By using the decorator I do not see the leak anymore I can run my reproducer when a scheduled of like 1000 invocation every 10 seconds and the memory is stable, contrary to doing the same with the Fallback method (or FallbackHandler class) where it just climbs up.
Also, did you look who holds the reference to the io.quarkus.arc.impl.EagerInstanceHandle?
I did not but if you want me to check anything I can, but likely not tomorrow.
Hm, I don't think this is related to SR FT Fallback but more likely to the Quartz integration in Quarkus, more specifically to the JobFactory and the way a new job is created: https://github.com/quarkusio/quarkus/blob/main/extensions/quartz/runtime/src/main/java/io/quarkus/quartz/runtime/QuartzSchedulerImpl.java#L1235-L1253. This factory is using programmatic lookup and so the @Dependent beans are kept in memory due to how jakarta.enterprise.inject.Instance works.
I had to debug the bean discovery process to figure out why does MyJob even become a bean -- it turns out we treat fault tolerance annotations as bean defining annotations.
@mkouba so I take it you're fixing this? :-)
I had to debug the bean discovery process to figure out why does MyJob even become a bean -- it turns out we treat fault tolerance annotations as bean defining annotations.
It would be a bean simply because it implements Job, see QuartzProcessor.
Even better :-)
Actually it isn't simply because it implements Job -- it implements Job and requires container services (in this case, declares an injection point). I had no idea we have this, nice!
Hm, I don't think this is related to SR FT
Fallbackbut more likely to the Quartz integration in Quarkus, more specifically to theJobFactoryand the way a new job is created: https://github.com/quarkusio/quarkus/blob/main/extensions/quartz/runtime/src/main/java/io/quarkus/quartz/runtime/QuartzSchedulerImpl.java#L1235-L1253. This factory is using programmatic lookup and so the@Dependentbeans are kept in memory due to howjakarta.enterprise.inject.Instanceworks.
This is indeed the case.
First thing coming to mind is using a CDI-aware wrapper working with Instance.Handle, something like this - https://github.com/quarkusio/quarkus/compare/main...manovotn:quarkus:issue38824?expand=1
But I don't know much about Quartz so @mkouba might have a smarter idea :)
Hm, I don't think this is related to SR FT
Fallbackbut more likely to the Quartz integration in Quarkus, more specifically to theJobFactoryand the way a new job is created: https://github.com/quarkusio/quarkus/blob/main/extensions/quartz/runtime/src/main/java/io/quarkus/quartz/runtime/QuartzSchedulerImpl.java#L1235-L1253. This factory is using programmatic lookup and so the@Dependentbeans are kept in memory due to howjakarta.enterprise.inject.Instanceworks.This is indeed the case. First thing coming to mind is using a CDI-aware wrapper working with
Instance.Handle, something like this - https://github.com/quarkusio/quarkus/compare/main...manovotn:quarkus:issue38824?expand=1 But I don't know much about Quartz so @mkouba might have a smarter idea :)
This would not be a correct solution. A @Dependent bean that serves as a Quartz job should not be automatically destroyed after Job#execute() completed.
The problem in this particular setup is that the user creates a bunch of "one-off" jobs, i.e. jobs that are only executed once, not periodically based on interval/cron (which is IMO a more common use).
In order to fix this correctly, we would need to hook into Quartz and attempt to destroy the job instance once it's executed and (probably) removed from the scheduler. I don't know if such an API exists though. Maybe org.quartz.TriggerListener.triggerComplete() and check for org.quartz.Trigger.CompletedExecutionInstruction.DELETE_TRIGGER. I need to take a look.
The problem in this particular setup is that the user creates a bunch of "one-off" jobs, i.e. jobs that are only executed once, not periodically based on interval/cron (which is IMO a more common use).
Ah, so for jobs that run periodically, you expect to reuse the @Dependent bean instance?
I was assuming you'd want a new one as well (as per nature of this scope).
Ah, so for jobs that run periodically, you expect to reuse the
@Dependentbean instance?
Exactly.
I was assuming you'd want a new one as well (as per nature of this scope).
This is not how org.quartz.spi.JobFactory works. In other words, in Quartz a job is not created for each job execution. @Scheduled methods in Quarkus work a bit differently.
I was assuming you'd want a new one as well (as per nature of this scope).
This is not how
org.quartz.spi.JobFactoryworks. In other words, in Quartz a job is not created for each job execution.@Scheduledmethods in Quarkus work a bit differently.
Ok, I see, I misunderstood how it's supposed to behave.
I can see one way to access the Trigger is from JobExecutionContext and from there you can do getTrigger().mayFireAgain() - that might be one way to check for when to destroy the instance.
That being said, any repeating triggers will still hang in there and the only way to properly destroy their dependent scoped jobs would be during QuartzSchedulerImpl#destroy.
I was assuming you'd want a new one as well (as per nature of this scope).
This is not how
org.quartz.spi.JobFactoryworks. In other words, in Quartz a job is not created for each job execution.@Scheduledmethods in Quarkus work a bit differently.Ok, I see, I misunderstood how it's supposed to behave.
@manovotn Actually, it seems that it was me who misunderstood how it's supposed to work and the Job instance is created for single execution only ;-). Which means that you wrapper should work, except that we should not call destroy() unless the bean is @Dependent.