chore: run e2e tests against self-hosted websites
Which package is the feature request for? If unsure which one to select, leave blank
None
Feature
The e2e tests in this project are relatively flaky, partially because of the dynamic nature of the pages we use for testing.
Both our own (crawlee.dev and apify.com) and third-party websites change relatively often and are unstable from time to time.
Motivation
To improve this, we might scrape our own self-hosted website in the e2e tests, similarly to the way we use the self-hosted HTTPBin instance in https://github.com/apify/impit . This way, the testing environment is fully controlled, ensuring reproducible test results.
Ideal solution or implementation, and any additional constraints
A bunch of example pages (even ones with dynamic content load) served from a Standby Actor.
Alternative solutions or implementations
No response
Other context
No response
Some of the E2E tests actually use httpbin for testing (cheerio-impit-ts, cheerio-curl-impersonate). We can easily swap these for httpbin.apify.actor to reduce the flakiness.
I wasn't sure if that one worked better, I recall there were some stability issues too, but maybe I remember it wrong.
Anyway, now it's working and the tests are passing.
It takes its time to start, but it's definitely better than httpbin.org