Improve documentation on performance optimization for Actors
Based on the feedback and raised issues, we should enhance our documentation by providing additional information & best practices for performance optimization
The documentation improvements could include the following:
-
Docker Image Size
- Provide information on where to find the Docker image size.
- Share tips and techniques for using minimal Docker images and reducing the Docker image size.
-
Faster Runtimes
- Recommend using faster runtimes, such as Bun instead of Node.js, when possible, and provide guidance on how to implement them.
-
Apify Docker Images
- Encourage users to base their Docker images on the latest Apify images, as they are more likely to be cached on the servers, leading to faster startup times.
-
Build Optimization
- Provide tips on pre-saving build artifacts or using Docker image layer caching to accelerate build times.
-
Other Optimizations
- Leave room for additional performance optimization techniques or suggestion.
The improvements should be incorporated into the existing Docker documentation, which is currently undergoing rewrites. However, we could also consider creating a dedicated section or page specifically focused on performance optimization for Actors.
Tagging for discussion:
@B4nan @vladfrangu @metalwarrior665 @fnesveda
Perhaps @drobnikj or @jirimoravcik may have an idea, and @metalwarrior665 could ask the right people in delivery engineering.
- Unless we have images with Bun and Crawlee running, I would skip mentioning that. People will be just asking us about it.
I would also mention to try write scrapers without the need of browsers.
Bun is still not supported, and we cant do much about that, we need to wait for them to fix their stuff. Currently the unsupported got library seems like the last blocker. Just gave it a try to see how the latest version works and its still failing.
➜ cheerio-error-test bun -b src/main.js
INFO System info {"apifyVersion":"3.2.0","apifyClientVersion":"2.9.3","crawleeVersion":"3.10.0","osType":"Darwin","nodeVersion":"v22.2.0"}
WARN ProxyConfiguration:
INFO CheerioCrawler: Starting the crawler.
113 | }
114 | };
115 |
116 | // This is basically inverted `closeCoveredSessions(...)`.
117 | const closeSessionIfCovered = (where, coveredSession) => {
118 | for (let index = 0; index < where.length; index++) {
^
TypeError: undefined is not an object (evaluating 'where.length')
at closeSessionIfCovered (/Users/adamek/htdocs/apify/cheerio-error-test/node_modules/http2-wrapper/source/agent.js:118:30)
at /Users/adamek/htdocs/apify/cheerio-error-test/node_modules/http2-wrapper/source/agent.js:524:7
at emit (node:events:161:95)
at #onConnect (node:http2:719:25)
at onConnect (node:http2:856:45)
at handshake (node:net:115:27)
WARN CheerioCrawler: Reclaiming failed request back to the list or queue. ERR_HTTP2_ERROR: h2 is not supported
{"id":"3FULW9cbbkMrV4R","url":"https://crawlee.dev","retryCount":1}
(probably https://github.com/oven-sh/bun/issues/8823)
So far in this PR I rewrote the whole doc and added admonitions about utilizing Apify Docker images which should at least make the information more visually distinct.
For now since no other runtimes are tested/working I would skip adding this info to docs.
Is there anything else that you would like to see there or is this enought for now?
Maybe @vladfrangu is there anything from your side?