spring-framework icon indicating copy to clipboard operation
spring-framework copied to clipboard

Memory leak with CglibAopProxy$ProxyCallbackFilter

Open Pancor opened this issue 4 years ago • 26 comments

It seems that there is a memory leak with Spring Boot 2.4.1 on my Debian 10.5 with Java 11.0.1 when using @PreAuthorize("hasRole('ROLE')") annotation on controller's endpoint. As you can see in the first image, there is constant grow in old generation memory of G1 garbage collector when running tests. The grow stops at max heap size which is 512 MB. I prepared a sample project where you can reproduce the problem.

Memory usage 1

image

Memory usage 2

image

After importing the project just run this command:

./gradlew clean test

Tests will fail because they assume that HTTP response will be 200, but it is 401, but it shouldn't matter. If you remove @PreAuthorize from DummyController or prePostEnabled = true from SecurityConfig then memory usage seems to be ok.

Memory usage 3

image

Memory usage 4

image

The project uses dummy controller with one endpoint which has @PreAuthorize("hasRole('ROLE')") annotation. It has also basic SecurityConfig to enable @PreAuthorize annotation. Controller uses some dummy services to make sure that for each test Spring will recreate the context. I also set spring.test.context.cache.maxSize=1 in build.gradle to make sure that the grow of memory does not come from cache.

If you have any more questions about my configuration or anything else, let me know.

Pancor avatar Dec 14 '20 11:12 Pancor

Thanks for the sample. I've reproduced the problem. As far as I can tell, it's caused by your configuration of the test context cache size. Limiting it to one means that many different application contexts are created, and for each context a new CGLib proxy of DummyController is created. This proxy creation creates a new DummyController subclass that's then pinned in memory along with all its static fields by the app class loader. If I remove the configuration of the spring.test.context.cache.maxSize system property, the build completes in under 20 seconds with no sign of a memory leak.

You said above that configuring the cache size was to "make sure that the grow of memory does not come from cache". It sounds like you'd identified a leak before making this configuration change but, as far as I can tell, this configuration is the cause of the leak.

Perhaps you can share a sample that reproduces the problem without tuning the test framework's context cache, or perhaps even reproduces it when running your application's main method rather than its tests?

wilkinsona avatar Dec 14 '20 13:12 wilkinsona

Docs says that spring.test.context.cache.maxSize has a default value to 32 and it is also the max value for this property. The main project I'm working on is quite large and I guess that I shouldn't public the code from it. I have got there more than 1k tests that uses different context configuration, more than 32, so I suppose that Spring, after running some tests, needs to create new context.

For sample project I set spring.test.context.cache.maxSize to 1 to easily reproduce the problem. If I set it to 10 or higher then memory usage is fine, but when I set it to 9 or lower then memory problems show up. So I'm guessing that if I prepare code sample that uses more than 32 contexts in tests (the current one uses 10 different contexts) then the problem will come up again.

When I run tests in sample project with spring.test.context.cache.maxSize as 1 and I remove @PreAuthorize("hasRole('LEAK')") from DummyControler then tests are slow, it's ok, Spring needs to recreate context, but memory usage is constant, it does not grow up after time, but when @PreAuthorize("hasRole('LEAK')") is not removed then memory grows up. When spring.test.context.cache.maxSize is not set, then tests run quite fast so it's hard to analyze if there is something wrong with memory usage, but again, tests will probably slow down when there will be more than 32 context configurations for tests.

I didn't try to reproduce the problem with application's main method because the problem rather is connected somehow with Spring that fails to clean up old context when tests are running. It's only my guess so I'm not sure what causes it. Maybe it is somehow connected with @PreAuthorize.

Pancor avatar Dec 14 '20 13:12 Pancor

I didn't try to reproduce the problem with application's main method because the problem rather is connected somehow with Spring that fails to clean up old context when tests are running

This would have been useful information in the original description.

Maybe it is somehow connected with @PreAuthorize

Yes, I believe this is the case. The use of @PreAuthorize requires DummyController to be proxied and it's this proxy creation that is causing the memory leak. Each new context creates a new DummyController sub-class for the CGLib proxy. There's nothing we can do about this in Spring Boot as @PreAuthorize is a Spring Security feature and CGLib proxy creation is part of Spring Framework.

I think we should get someone from the Framework team to look at this. It may be that something isn't behaving as intended or that could be improved in the proxy creation that would enable some reuse.

wilkinsona avatar Dec 14 '20 13:12 wilkinsona

This sounds a bit like #14113 which was, once upon a time, solved (accidentally) with #21674.

mdeinum avatar Dec 14 '20 14:12 mdeinum

Any updates here? Some cglib generated proxy classes keep reference to BeanFactory, so BeanFactory can't be garbage collected, and memory footprint grows and grows. It is the most sensitive in SpringBootTests, where a lot of contexts could be created. For example, spring security proxies keep reference to BeanFactory in MethodSecurityMetadataSourceAdvisor.

Nazukin-Dmitry avatar Apr 24 '21 13:04 Nazukin-Dmitry

If the cause of the problem is due to the fact that Spring Security's MethodSecurityMetadataSourceAdvisor is retaining a reference to the BeanFactory (which seems plausible based on the current source code), then the solution might be to refactor MethodSecurityMetadataSourceAdvisor so that it releases that reference when the enclosing ApplicationContext is closed.

@rwinch & @jhoeller, thoughts?

sbrannen avatar Apr 27 '21 15:04 sbrannen

Thanks for bringing this to my attention @sbrannen.

It does appear that the class generated by CGLIB is (indirectly) holding a reference to MethodSecurityMetadataSourceAdvisor which in turn holds a reference to the BeanFactory.

For reference here is a list of objects retained by ProxyCallbackFilter

image

Here is the shortest GC Root of ProxyCalbackFilter

image

I don't think Spring Security is the correct place to solve this. Any cleanup that the Advisors perform would just be masking the actual leak which is in Spring Framework + CGLIB. Even if the Advisor cleans up all of its resources, the Advisor itself is leaked and so are lower level objects (i.e. AdvisedSupport).

I think it would be ideal if Spring Framework could remove the reference to the ProxyCallbackFilter on CALLBACK_FILTER of the generated class once we know that the CALLBACK_FILTER is no longer needed. It might also mark the generated class as closed so that if it is used again it reports an error.

Removing the refererence to ProxyCallbackFilter would ensure that references to AdvisedSupport are cleaned up along with its methodCache, all references to its advisors, and its remaining members. It's important to note that cleaning up references to advisors will clean up the reference to MethodSecurityMetadataSourceAdvisor and thus the BeanFactory it refers to. It will also cleanup references to other Advisors and any large objects they might refer to. For example, AbstractBeanFactoryPointcutAdvisor has it's own BeanFactory reference.

Obviously this still means the generated classes are still unnecessarily around, but I'm not sure we can drop the references to those given they are held by the jdk.

rwinch avatar Apr 28 '21 03:04 rwinch

I wouldn't say that problem is only in MethodSecurityMetadataSourceAdvisor.

In our application I also see reference from RetryConfiguration:

image

And such chain(related to RefreshScope) keeps reference to GenericWebApplicationContext .

image

Nazukin-Dmitry avatar Apr 28 '21 08:04 Nazukin-Dmitry

Obviously this still means the generated classes are still unnecessarily around, but I'm not sure we can drop the references to those given they are held by the jdk.

Would it be possible to generate cglib classes using a classloader that is tied to the ApplicationContext? So when the context gets dropped, the loader is no longer referenced from the GC root and the classes are unloaded.

If I'm not mistaken, the proxy factory already uses the bean factory's classloader, so it would only be a matter of creating a new classloader inside AbstractContextLoader#prepareContext(ConfigurableApplicationContext, MergedContextConfiguration) with ClassUtils.getDefaultClassLoader() as its parent.

grubeninspekteur avatar Aug 05 '21 10:08 grubeninspekteur

@grubeninspekteur Switching the ClassLoader used to create the cglib classes will cause ClassCastExceptions to be thrown unless the entire application uses the new ClassLoader which will likely lead to other problems.

rwinch avatar Aug 19 '21 18:08 rwinch

Related issues:

  • #27375
  • #27395

sbrannen avatar Sep 09 '21 11:09 sbrannen

Closing this one based on the assumption that #27375 covers all occurrences of this leak (as a regression in the 5.3.x line). Please give the 5.3.10 release a try, it's available in snapshots already and will be officially released tomorrow!

jhoeller avatar Sep 14 '21 12:09 jhoeller

@jhoeller Unfortunately, it does not appear to have fixed this specific issue. Updating the original sample to use Spring 5.3.10 does not resolve the issue. I've put together a branch named spring-5.3.10 that demonstrates it is not fixed. If you run ./gradlew check you will see lots of failed tests (expected) and then eventually you will start getting out of memory errors. I haven't looked into if anything has changed from the previous assessment, but the take away at this point is the tests still produce out of memory errors.

Can we please reopen the issue?

rwinch avatar Sep 15 '21 21:09 rwinch

Thanks for trying out the sample against 5.3.10 and providing that branch.

We'll look into it.

sbrannen avatar Sep 16 '21 12:09 sbrannen

It seems I confirm leak in branch named spring-5.3.10

MacBook-Pro-Nikita:~ nkonev$ uname -a
Darwin MacBook-Pro-Nikita.local 20.6.0 Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64 x86_64

I started App in IntelliJ IDEA (only changed repositories in build.gradle to mavenCentral()), then

while(true) do curl -i 'http://localhost:8080'; done

about after 20 minutes I saw Снимок экрана 2021-10-05 в 13 33 58

I also took heapdump and uploaded onto my hosting.

I think this can be useful.

Thoughts:

  • Are there Tomcat sessions, which can be created per each unauthorized request ?

Update 1: I opened heapdump image

Let's shrink session timeout: After setting

server.servlet.session.timeout=5s

in application.properties the picture after 20 minutes is better image

image

Update 2: Seems I tested the different case, sorry, hiding this. The original was about tests ./gradlew check but I tested like the 'production'.

nkonev avatar Oct 06 '21 22:10 nkonev

For the interest - I can't reproduce it against 5.3.10 neither my Mac with ./gradlew check Снимок экрана 2021-10-07 в 02 46 46

nor Linux Снимок экрана_2021-10-07_02-57-56

nkonev avatar Oct 06 '21 23:10 nkonev

Thanks for providing additional information @nkonev. I'm curious if you are able to reproduce the issue with the original application that was submitted. Perhaps you have GRADLE_OPTS environment variable specified which is providing a larger amount of memory to the application?

./gradlew check runs for me but slows down and eventually hangs at around 300 tests ran due to running out of memory.

image

I haven't looked into why it doesn't put a heap dump after executing, but it is reporting it runs out of memory in the terminal.

For reference my environment is

gw --version

------------------------------------------------------------
Gradle 7.2
------------------------------------------------------------

Build time:   2021-08-17 09:59:03 UTC
Revision:     a773786b58bb28710e3dc96c4d1a7063628952ad

Kotlin:       1.5.21
Groovy:       3.0.8
Ant:          Apache Ant(TM) version 1.10.9 compiled on September 27 2020
JVM:          11.0.11 (AdoptOpenJDK 11.0.11+9)
OS:           Linux 5.4.0-88-generic amd64

rwinch avatar Oct 11 '21 15:10 rwinch

I don't have either GRADLE_OPTS, or gradle.properties

MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$ ls -lah ~/.gradle/
total 0
drwxr-xr-x  11 nkonev  staff   352B  8 сен 01:35 .
drwxr-xr-x+ 78 nkonev  staff   2,4K  9 окт 05:34 ..
drwxr-xr-x   7 nkonev  staff   224B  7 окт 02:45 .tmp
drwxr-xr-x  18 nkonev  staff   576B  6 окт 12:41 caches
drwxr-xr-x  16 nkonev  staff   512B  6 окт 12:41 daemon
drwxr-xr-x   2 nkonev  staff    64B  4 мар  2021 jdks
drwxr-xr-x   2 nkonev  staff    64B  9 окт 05:08 kotlin-profile
drwxr-xr-x  13 nkonev  staff   416B  6 окт 12:32 native
drwxr-xr-x   9 nkonev  staff   288B  8 сен 01:36 notifications
drwxr-xr-x   3 nkonev  staff    96B 11 авг 13:29 workers
drwxr-xr-x   3 nkonev  staff    96B 18 фев  2021 wrapper
MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$ echo $GRADLE_OPTS

MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$

MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$ echo $JAVA_HOME

MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$ java -version
openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.8+10, mixed mode)

MacBook-Pro-Nikita:spring-framework-gh-26266 nkonev$ ./gradlew --version

------------------------------------------------------------
Gradle 7.2
------------------------------------------------------------

Build time:   2021-08-17 09:59:03 UTC
Revision:     a773786b58bb28710e3dc96c4d1a7063628952ad

Kotlin:       1.5.21
Groovy:       3.0.8
Ant:          Apache Ant(TM) version 1.10.9 compiled on September 27 2020
JVM:          11.0.8 (AdoptOpenJDK 11.0.8+10)
OS:           Mac OS X 10.16 x86_64

Yes, I reproduced leak in the original example. Also on Linux Снимок экрана_2021-10-11_20-50-26

Снимок экрана 2021-10-11 в 20 25 39

But your https://github.com/rwinch/spring-framework-gh-26266/tree/spring-5.3.10 works without leak.

Consider to force-kill gradle daemons, I did it before the test, something like

jps | grep -i gradle | awk '{print $1}' | xargs kill -9

or just

killall -9 java

nkonev avatar Oct 11 '21 17:10 nkonev

@nkonev Hmm..Not sure why you are getting a difference in behavior but good to know. Thanks!

rwinch avatar Oct 11 '21 19:10 rwinch

I reproduced it by just setting ext['spring-framework.version']='5.3.10' into build.gradle in the original example. But previously I didn't reproduce it in your. It seems there is a dependency on test order, which can be different. I know that it is unspecified by default, but it seems that gradle eventually orders tests by hash of directory content or modification time.

Снимок экрана_2021-10-12_06-55-57

So here is heap dumps here that launch after a minute

nkonev avatar Oct 12 '21 04:10 nkonev

May be related with https://github.com/mockito/mockito/issues/2503

nkonev avatar Dec 10 '21 07:12 nkonev

with org.springframework:spring-core:5.3.14

a little class like this:

  @RestController
  @PreAuthorize("hasAuthority('GLOBAL_ADMIN') or hasAuthority('MEMBER')")
  public class AccessControlTestController {
    @GetMapping("/v1/access_control_test")
    String test() {
      return "OK";
    }
  }

seems to cause a nasty leak like this:

Screenshot 2022-02-18 at 11 32 55 am

when running integration tests, that leads to OutOfMemoryError

sdyura avatar Feb 18 '22 11:02 sdyura

We are having the same issue. We start and close some contexts during the lifetime of our app. If that happens a lot, after a while we get OOM.

After manually nulling all CALLBACK_FILTER references in all proxy classes, the bean factory is finally garbage collected.

I think it would be ideal if Spring Framework could remove the reference to the ProxyCallbackFilter on CALLBACK_FILTERof the generated class once we know that the CALLBACK_FILTER is no longer needed. It might also mark the generated class as closed so that if it is used again it reports an error.

This would be great. I tried to find a way to do it myself, but didn't find any hook that would allow me that. One dirty way was listing all classes from ClassLoader, filtering those containing BySpring, making the field accessible and setting it to null by reflection.

This however won't work if there is more than one context and I don't know which proxy classes belong to context that I'm closing.

I also tried to collect proxy classes in a custom class loader which I set before the context startup but that didn't really work either. The classes were coming from different class loaders in the hierarchy (some are loaded from parent, cglib proxy classes are magically set to the child loader by reflection.. then they don't match or are not assignable from one to another, and he'll is lose, so this is not a way to go I guess).

redhead avatar Jun 02 '22 22:06 redhead

I tried to write a class loader to collect created proxy classes and clean the static field after the context is closed. However, our setup of contexts is not trivial and I was running into many class-to-class problems (classes not assignable to each other, etc.).

So I went for the BeanPostProcessor instead, which might not work at all times maybe, but works for us so far.

public class MemoryLeakFixBeanPostProcessor implements BeanPostProcessor, DisposableBean {

    /**
     * See {@link org.springframework.cglib.proxy.Enhancer}
     */
    private static final String CGLIB_PROXY_STATIC_FIELD_WITH_MEMORY_LEAK = "CGLIB$CALLBACK_FILTER";

    /**
     * See {@link org.springframework.cglib.core.SpringNamingPolicy}
     */
    private static final String SPRING_CGLIB_PROXY_CLASS_NAME_TAG = "BySpringCGLIB";

    private final Set<Class<?>> classesToCleanUp = new HashSet<>();

    @Override
    public Object postProcessAfterInitialization(Object bean, @Nonnull String beanName) throws BeansException {
        if (bean.getClass().getName().contains(SPRING_CGLIB_PROXY_CLASS_NAME_TAG)) {
            classesToCleanUp.add(cls);
        }
        return bean;
    }

    @Override
    public void destroy() throws Exception {
        classesToCleanUp.forEach(cls -> {
            try {
                var field = cls.getDeclaredField(CGLIB_PROXY_STATIC_FIELD_WITH_MEMORY_LEAK);
                field.setAccessible(true);
                field.set(null, null);
            } catch (Exception e) {
                // we can't do anything..
            }
        });
        classesToCleanUp.clear();
    }
}

It collects the bean classes if they are Spring proxy classes and clears the static field when the bean post processor is disposed (context is closed).

redhead avatar Jun 07 '22 10:06 redhead

Thanks for sharing the workaround @redhead!

rwinch avatar Jun 07 '22 18:06 rwinch

I have made a pull request for this.

vikeychen avatar Jul 30 '22 07:07 vikeychen

Will this issue also affect the new Spring Releases, e.g. Spring Boot 3.0.x and the then used Spring Framework 6.0.x ?

Evenprime avatar Nov 21 '22 07:11 Evenprime

Same here. with spring-boot 3.0.4 and spring-framework 6.0.6

Running a few integration test with @DirtiesContext and then I get an OOM after a few tests.

In my case it seems to be the @Validated annotation


@Controller
@Validated
@RequiredArgsConstructor
public class WettquoteLigamappingAnlegenServiceGQL {
     // ... injected services fields ...
    @Transactional
    @MutationMapping
    public @NotNull WettquoteLigamappingDTO wettquoteLigamappingAnlegen(
             @Argument @NotNull @Valid WettquoteLigamappingForm wettquoteLigamappingForm) {
            // ....
    }
}

After running three simple tests with @DirtiesContext I have3 applicationContexts on the heap. Setting a breakpoint in one test, doing forced GC and getting a heap with YourKit Profiler looks like this:

Bildschirmfoto vom 2023-03-28 10-41-34

After adding this workaround to my test context, everything works again: ((thanks to @redhead, code slightly modified)

package de.kicktipp.testutils;

import lombok.extern.slf4j.Slf4j;
import org.jetbrains.annotations.NotNull;
import org.springframework.beans.BeansException;
import org.springframework.beans.factory.DisposableBean;
import org.springframework.beans.factory.config.BeanPostProcessor;
import org.springframework.stereotype.Component;
import java.util.HashSet;
import java.util.Set;

//https://github.com/spring-projects/spring-framework/issues/26266
@Component
@Slf4j
public class MemoryLeakFix implements BeanPostProcessor, DisposableBean {

    private static final String CGLIB_PROXY_STATIC_FIELD_WITH_MEMORY_LEAK = "CGLIB$CALLBACK_FILTER";
    private static final String SPRING_CGLIB_PROXY_CLASS_NAME_TAG = "SpringCGLIB";

    private final Set<Class<?>> classesToCleanUp = new HashSet<>();

    @Override
    public Object postProcessAfterInitialization(Object bean, @NotNull String beanName) throws BeansException {
        if (bean.getClass().getName().contains(SPRING_CGLIB_PROXY_CLASS_NAME_TAG)) {
            classesToCleanUp.add(bean.getClass());
        }
        return bean;
    }

    @Override
    public void destroy() {
        classesToCleanUp.forEach(cls -> {
            try {
                var field = cls.getDeclaredField(CGLIB_PROXY_STATIC_FIELD_WITH_MEMORY_LEAK);
                field.setAccessible(true);
                field.set(null, null);
            } catch (NoSuchFieldException e) {
                log.info("No Field", e);
            } catch (IllegalAccessException e) {
                log.error(e.getMessage(), e);
            }
        });
        classesToCleanUp.clear();
    }
}

kicktipp avatar Mar 28 '23 08:03 kicktipp

I've got a promising solution in the works where we reduce the ProxyCallbackFilter instance to a plain key role afters its initial use for class generation, stubbing out the Advisors with key stubs (containing advice class and pointcuts just for comparison purposes) once the class has been generated.

While the CGLIB$CALLBACK_FILTER field remains non-null, its content will not retain any leaky references anymore, independent from the concrete Advisor implementations in use there. This should hopefully address the common scenarios for long-lived ClassLoaders as outlined above.

jhoeller avatar Jun 07 '23 23:06 jhoeller

Can someone please let me know if the issue that is being discussed here is similar to the one I am observing here?

debraj-manna avatar Mar 05 '24 16:03 debraj-manna