[BUG] Playwright codegen performance issue with many elements on the page
System info
- Playwright Version: v1.31.2
- Operating System: Windows, but most likely in others
- Browser: Chromium, but most likely in others
Source code
- [ ] I Here is a repo that reproduces the issue locally. Just run
npx playwright codegen <local-html-file-in-repo>and then start moving the mouse around in the site to see the lag in performance. https://www.youtube.com and https://www.amazon.com also has a lag (not as much as my example above)
Link to the GitHub repository with the repro
https://github.com/danielmhair/playwright-recorder-lag-repro
Steps
- npm ci
- npx playwright codegen "file://location-to-local-version-of-my-repro-repo/open-15000-elements.html"
Expected
The HTML files I provide in the repository I link above, it has 15000 elements with very little depth. I initially expected this to run quite fast and not slow down to highlight the elements.
Actual If you notice the codegen runs very slow. We have quite a few use cases that will have 15000 elements or more (I know, not the best in performance for the site itself). I wish I could provide those use cases, but they have login credentials that I can't give out. So hopefully this repo is enough to work through. I understand that this is just one use case that deals with only divs. If needed, I can provide other examples where it can handle cases like text, many ids, placeholders and others to give a better example. But hopefully, this shows the issue enough to improve performance.
Willing to contribute if needed
Also, I'm happy to help out if needed. If you have any tips on what might help improve the performance, I'd be happy to create a PR for a fix for the improvement.
Thank you
I also wanted to mention how awesome playwright is. I have used Cypress for many years, but time and time again, Playwright proves to be better, so I'm planning to complete switch over in my company very soon.
Yeah I am facing the same issues. It would be awesome to see it get solved soon
Any progress on this issue? I noticed its assigned v1.33 to the issue, which is exciting. Let me know if I can help out in any way.
Facing the same issue on my website that lists many products. Hope this could be resolved soon
Investigation notes:
After looking into it I found two issues:
a) that we clear most of our good caches after each generateSelector call b) that we in two places not put DOM Elements into our _cached function, we pass a always newly constructed object instead. This gives us always negative cache hits.
When applying fixes for these two issues, the performance is significantly better. Gist: https://gist.github.com/mxschmitt/98936e03196cc06c243cf618007743bc
Exciting!
One question: Will the cache be enough?
What about the multiple querySelectorAll calls that are made in the recorder when a candidate selector is being ranked and the results of querySelectorAll has 10,000+ elements?
For context, I noticed that the recorder will build a list of "candidates" for a given element. For example, I clicked on an element with id that was 20 divs in. (the most inner element). When we clicked on that element, the recorder would build two candidates for potential selectors:
-
div=> This one returns 15000 elements (because its the root divs) -
#inner-child-id=> This one returns 15000 elements (because of its the inner-most divs per div)
In this example, we implemented a bad practice to create multiple elements with the same id. Obviously, this isn't preferable, but I wanted to point out the issue when you have multiple candidates that produce high elements. The recorder will take the result of these and combine them into one array, totaling to an array of 30,000 elements.
In short, the recorder will call document.querySelectorAll for every candidate that it "thinks" is a good selector. Based on this, it will determine which one is best. To me, the process of gathering each querySelectorAll result is what is also causing the performance hit.
I wonder if we need to somehow reduce the number of querySelectorAll calls, per selector generation. Thoughts?
My follow-up question is: When we cache these results, are we caching the querySelectorAll result as well? Overtime, we will have multiple caches with arrays of 15,000 elements. I don't think this would be good with performance and memory.
Code Context for previous comment:
These are the methods that are causing the issue:
- Root function: https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/selectorGenerator.ts#L81
- Next function called: https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/selectorGenerator.ts#L117
- This is the problem-child inner function that calls the querySelectorAll multiple times and if you put a break point on that method, you can see it gathering 30,000 elements:
- https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/selectorEvaluator.ts#L332
- This is the problem-child inner function that calls the querySelectorAll multiple times and if you put a break point on that method, you can see it gathering 30,000 elements:
- Next function called: https://github.com/microsoft/playwright/blob/main/packages/playwright-core/src/server/injected/selectorGenerator.ts#L117
There doesn't seem to be an easy fix in selectorGenerator that will work for dynamic pages changing over time. Currently, we cache as much as we can during synchronous execution, which is safe, but have to reset caches before next generation. Leaving the issue open, just in case we'd like to do some major reworks in this area.
I can understand that. It would be nice if there was an elegant way to fix it, but I felt that way too when I looked through it.
I share the hope that this issue can be optimized.
It has more factors than element count, doesn’t it? I found this site with only ~1,000 elements, yet it is slower than the example with 15,000 elements that you provided:
https://forestinfo.forest.go.th/
It froze Codegen running on my old computer.
It has more factors than element count, doesn’t it? I found this site with only ~1,000 elements, yet it is slower than the example with 15,000 elements that you provided:
https://forestinfo.forest.go.th/
It froze Codegen running on my old computer.
Hmm. I still wonder whether its because we are calling querySelectorAll so many times. But with only 1000 elements, that is really interesting.
Any update on this? We have few instances of AG-Grid in our app and the performance is so slow that we are unable to use the codegen.
Any update on this one? Facing similar issue. @mxschmitt
Bumping this to see if any update on a fix. Facing the same issue with codegen freezing on a certain module of our application that has >5000 elements.
Bumping this to see if any update on a fix. Facing the same issue with codegen freezing randomly