node-html-to-image
node-html-to-image copied to clipboard
Is it possible to reuse the puppeteer instance for better performance?
What is your recommended settings to maximize performance in the following case:
- Small HTMLs usually resulting jpeg images of 10K-50K in size
- HTML content is 100% trusted
- HTML is send via data uri
- The image is returned in memory as a buffer and not saved to disk.
Thanks!
Hi @tomerb15 👋
I don't understand what is the issue you encountered. Is it too slow? If it is the case can you share your code, please?
Hey, we use it to generate many small screenshots. Basically small HTMLs to JPEGs. The problem is that we have high volumes.
We found puppeteer-cluster and with the right configuration, we manage to crack it down and go from ~900ms per image (Which grow up to 60 seconds at peak) to a steady ~350ms per image which solved our problem.
Hope this will help anyone in the future.
@tomerb15 as you can see here node-html-to-image uses puppeteer-cluster under the hood. Maybe you could help optimize it with your configuration?
Here is our configuration for puppeteer-cluster which works very well:
{
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 10,
puppeteerOptions: {args: ['--no-sandbox', '--headless', '--disable-gpu']}
}
We don't see any memory leak from the browsers. We also taking screenshot from a specific element in the dom rather then the entire body.
@tomerb15 how did you configure it ? can you please provide an example ?
Perhaps the maxConcurrency
option can be specified as part of the options? I have a bot that produces images and spawning an instance every time to render images is a huge performance hit for me (about 500ms on average + network overhead). Different systems have different capabilities and the maxConcurrency
option should be dependent on users, perhaps with the default as 2
. Willing to PR this.
Just discovered another performance-related issue. You create a cluster for a batch of jobs, but the maxConcurrency option fails to use the full advantage of puppeteer-cluser
AND the fact that you terminated the whole cluster after the job is done is also a performance hit.
So issues currently present are:
- In case someone needs to render multiple images at the same time, they're limited to 2 instances at a time which doesn't utilise the full potential of
puppeteer-cluster
. - In my case, I need to re-use the same cluster/instance for on-demand requests. The library currently terminates clusters at the end of each
call
, which is useless for me and actually adds more overhead than before.
Solutions:
- Make
maxConcurrency
part of the options passed to the function. - Add an option to re-use the same cluster.
I'm ready to work on implementing these solutions.
Spawning a cluster on a clean start of the bot took 3,000ms
.
Subsequent spawns took an average of over 300ms
.
@frinyvonnick It would be nice to create some kind of instance object, this way puppeteer can be launched in the background as soon as the module is required or the object is instantiated. Because the instantiation of puppeteer occurs within the nodeHtmlToImage() call, we can't safely start chrome until that very moment.
In other words, my Dream Code:
// imageMaker contains the puppeteer instance
const imageMaker = require('node-html-to-image');
imageMaker.createImage({html: html}) // as many times as you want... reuses the same browser
@frinyvonnick The changes would be significant in terms of performance and not go beyond a few lines of code.
@KL13NT I look at all issues and new messages. Answering takes some time. Avoid pinging me without new information. I'll answer as soon as I can. I maintain this package in my free time. Please, keep that in mind 😄
I'm facing this issue... it takes about 2 seconds to render per image. I've already tried to change puppeter settings, but the time didn't change... can you please help me? I generate a buffer with the image in jpeg. Thanks in advance.
@BrnPer @KL13NT I'd love a tester for my change #95 . If you're willing to tweak the way you call the library, you can test it easily.
@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.
Here are the existing use cases:
Single image
const nodeHtmlToImage = require('node-html-to-image');
nodeHtmlToImage({
html: '<html><body>Hello world!</body></html>',
})
Batch image creation
const nodeHtmlToImage = require('node-html-to-image');
nodeHtmlToImage({
html: '<html><body>Hello {{name}}!</body></html>',
content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
.then(() => console.log('The images were created successfully!'))
We might expose a name export that let you reuse the instance like it:
const { launch } = require('node-html-to-image');
const instance = launch({ puppeteerArgs })
instance.screenshot({
html: '<html><body>Hello {{name}}!</body></html>',
content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }]
})
.then(() => console.log('The images were created successfully!'))
instance.screenshot({
html: '<html><body>Hello {{name}}!</body></html>',
content: { name: 'Michel' }
})
.then(() => console.log('The image was created successfully!'))
@frinyvonnick I think it would be a good idea to expose the instance. In my testing I shave some milliseconds off by using the following puppeterArgs:
['--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--disable-dev-shm-usage', '--disable-accelerated-2d-canvas', '--no-first-run', '--no-zygote']
And changing the waitUntil to 'load'
.
It now takes about 1 second to generate the image. But if was possible to generate even faster, would be awesome 😎
And btw @frinyvonnick awesome job with this library, well documented and very useful 🎉
This would be of great use to me as well, I'm attempting to use this to create information cards to display in Discord embeds. Unfortunately it takes a rather long time for the images to be generated (around 4-5 seconds).
@joshk0 I would prefer to avoid breaking change. I created this package in the idea of having something simple and easy to use. Initially I thought people might directly use puppeteer for more advance use cases. I changed a bit my mind since people seems to enjoy this package.
Here are the existing use cases:
Single image
const nodeHtmlToImage = require('node-html-to-image'); nodeHtmlToImage({ html: '<html><body>Hello world!</body></html>', })
Batch image creation
const nodeHtmlToImage = require('node-html-to-image'); nodeHtmlToImage({ html: '<html><body>Hello {{name}}!</body></html>', content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }] }) .then(() => console.log('The images were created successfully!'))
We might expose a name export that let you reuse the instance like it:
const { launch } = require('node-html-to-image'); const instance = launch({ puppeteerArgs }) instance.screenshot({ html: '<html><body>Hello {{name}}!</body></html>', content: [{ name: 'Pierre' }, { name: 'Paul' }, { name: 'Jacques' }] }) .then(() => console.log('The images were created successfully!')) instance.screenshot({ html: '<html><body>Hello {{name}}!</body></html>', content: { name: 'Michel' } }) .then(() => console.log('The image was created successfully!'))
Could we get an update on the whole idea about reusing the instance?
The usecase i'm facing is the following:
12 different HTML templates 10 different sets of content
Right now, we have to go through them, one by one, because we have multiple different templates, which in turn means that we have to create and load a new puppeteer instance for every single image.
I could live with us being able to make html
and content
pairs
I discover that puppeteer-cluster can take a puppeteer object as parameter of the launch method (see https://github.com/thomasdondorf/puppeteer-cluster#clusterlaunchoptions). This could be an option to pass your own instance of puppeteer. Is someone interested in opening a pull request to add this option?
Another performance improvement could be to use an instance of puppeteer-core with headless shell (see https://github.com/puppeteer/puppeteer/blob/main/docs/api.md#puppeteer-vs-puppeteer-core). Could be useful in some contextes where disk space matters. Anyone interested in by this too?
I had performance issues for a scenario where a user request would start some good (~300) amount of screenshots, so if more than a small amount of users decided to do the same thing, the machine would run out of memory. I decided to create a singleton puppeteer cluster and enqueue manually the screenshots, another improvement was to render all the html in the same page and run query selectors (all the divs were small, height around 350px). I went from ~58 seconds to take the screenshots to ~15.
I'm very interested in this feature!
Any update on this?
What is the current workaround for performance issue