crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

chore: run e2e tests against self-hosted websites

Open barjin opened this issue 8 months ago • 3 comments

Which package is the feature request for? If unsure which one to select, leave blank

None

Feature

The e2e tests in this project are relatively flaky, partially because of the dynamic nature of the pages we use for testing.

Both our own (crawlee.dev and apify.com) and third-party websites change relatively often and are unstable from time to time.

Motivation

To improve this, we might scrape our own self-hosted website in the e2e tests, similarly to the way we use the self-hosted HTTPBin instance in https://github.com/apify/impit . This way, the testing environment is fully controlled, ensuring reproducible test results.

Ideal solution or implementation, and any additional constraints

A bunch of example pages (even ones with dynamic content load) served from a Standby Actor.

Alternative solutions or implementations

No response

Other context

No response

barjin avatar Apr 01 '25 11:04 barjin

Some of the E2E tests actually use httpbin for testing (cheerio-impit-ts, cheerio-curl-impersonate). We can easily swap these for httpbin.apify.actor to reduce the flakiness.

barjin avatar May 14 '25 06:05 barjin

I wasn't sure if that one worked better, I recall there were some stability issues too, but maybe I remember it wrong.

Anyway, now it's working and the tests are passing.

B4nan avatar May 14 '25 07:05 B4nan

It takes its time to start, but it's definitely better than httpbin.org

janbuchar avatar May 14 '25 08:05 janbuchar