firecrawl
firecrawl copied to clipboard
[Self-Host] Client JS sdk as well as python sdk does not connect wtih http://localhost:3002 build url within the same system.
- I clearly followed this self-host instruction.
- I also cloned the .env.example from
/api
and did the minimal changes ofPLAYWRIGHT_MICROSERVICE_URL
tohttp://localhost:3000/scrape
- I have shared my full
.env
file's code below.
Run docker compose
I then run docker compose build
and docker compose up
Check the api
I then hit this below curl request,
curl -X POST http://localhost:3002/v1/crawl \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_API_KEY' \
-d '{
"url": "https://docs.firecrawl.dev",
"limit": 100,
"scrapeOptions": {
"formats": ["markdown", "html"]
}
}'
It ran successfully.
Connect it with python sdk and Nodejs sdk
In nodejs code, I ran this below code,
const firecrawl = new FirecrawlApp({
apiKey: '',
apiUrl: "http://localhost:3002",
});
but I always get error like this below.
Environment (please complete the following information):
- OS: [Windows]
Logs
PS C:\Users\user\Desktop\user\firecrawl_original> docker compose build; docker compose up
time="2025-02-13T02:30:43+05:45" level=warning msg="The \"OPENAI_BASE_URL\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:43+05:45" level=warning msg="The \"LOGTAIL_KEY\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:43+05:45" level=warning msg="The \"SERPER_API_KEY\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:43+05:45" level=warning msg="The \"LOGTAIL_KEY\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:43+05:45" level=warning msg="The \"OPENAI_BASE_URL\" variable is not set. Defaulting to a blank string."
[+] Building 4.7s (62/62) FINISHED docker:desktop-linux
=> [playwright-service internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 1.20kB 0.0s
=> [playwright-service internal] load metadata for docker.io/library/python:3.11-slim 1.9s
=> [playwright-service internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [playwright-service 1/6] FROM docker.io/library/python:3.11-slim@sha256:42420f737ba91d509fc60d5ed65ed0492678a90c561e1fa08786ae8ba8b52eda 0.0s
=> [playwright-service internal] load build context 0.0s
=> => transferring context: 242B 0.0s
=> CACHED [playwright-service 2/6] RUN apt-get update && apt-get install -y --no-install-recommends gcc libstdc++6 0.0s
=> CACHED [playwright-service 3/6] WORKDIR /app 0.0s
=> CACHED [playwright-service 4/6] COPY requirements.txt ./ 0.0s
=> CACHED [playwright-service 5/6] RUN pip install --no-cache-dir --upgrade -r requirements.txt && pip uninstall -y py && playwright install chromium && playwright install- 0.0s
=> CACHED [playwright-service 6/6] COPY . ./ 0.0s
=> [playwright-service] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:c21039976bbd3669abe47e8dc9d5b55c7517c64753aa954555e44ac4a481a382 0.0s
=> => naming to docker.io/library/firecrawl-playwright-service 0.0s
=> [playwright-service] resolving provenance for metadata file 0.0s
=> [api internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.17kB 0.0s
=> [worker internal] load metadata for docker.io/library/rust:1-bullseye 2.2s
=> [worker internal] load metadata for docker.io/library/golang:1.19 2.2s
=> [worker internal] load metadata for docker.io/library/node:20-slim 2.1s
=> [api internal] load .dockerignore 0.0s
=> => transferring context: 77B 0.0s
=> [api internal] load build context 0.0s
=> => transferring context: 14.74kB 0.0s
=> [worker go-base 1/3] FROM docker.io/library/golang:1.19@sha256:3025bf670b8363ec9f1b4c4f27348e6d9b7fec607c47e401e40df816853e743a 0.0s
=> [worker rust-base 1/3] FROM docker.io/library/rust:1-bullseye@sha256:1e3f7a9fd1f278cc4be02a830745f40fe4b22f0114b2464a452c50273cc1020d 0.0s
=> [worker base 1/5] FROM docker.io/library/node:20-slim@sha256:5da391c4b0398f37074b6370dd9e32cebd322c2a227053ca4ae2e7a257f0be21 0.0s
=> CACHED [worker base 2/5] RUN npm i -g corepack@latest 0.0s
=> CACHED [worker base 3/5] RUN corepack enable 0.0s
=> CACHED [api base 4/5] COPY . /app 0.0s
=> CACHED [api base 5/5] WORKDIR /app 0.0s
=> CACHED [api prod-deps 1/1] RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile 0.0s
=> CACHED [api stage-5 1/5] COPY --from=prod-deps /app/node_modules /app/node_modules 0.0s
=> CACHED [api build 1/4] RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile 0.0s
=> CACHED [api build 2/4] RUN apt-get clean && apt-get update -qq && apt-get install -y ca-certificates && update-ca-certificates 0.0s
=> CACHED [api build 3/4] RUN pnpm install 0.0s
=> CACHED [api build 4/4] RUN --mount=type=secret,id=SENTRY_AUTH_TOKEN bash -c 'export SENTRY_AUTH_TOKEN="$(cat /run/secrets/SENTRY_AUTH_TOKEN)"; if [ -z $SENTRY_AUTH_TOKEN ]; 0.0s
=> CACHED [api stage-5 2/5] COPY --from=build /app /app 0.0s
=> CACHED [api go-base 2/3] COPY sharedLibs/go-html-to-md /app/sharedLibs/go-html-to-md 0.0s
=> CACHED [api go-base 3/3] RUN cd /app/sharedLibs/go-html-to-md && go mod tidy && go build -o html-to-markdown.so -buildmode=c-shared html-to-markdown.go && chmod +x h 0.0s
=> CACHED [api stage-5 3/5] COPY --from=go-base /app/sharedLibs/go-html-to-md/html-to-markdown.so /app/sharedLibs/go-html-to-md/html-to-markdown.so 0.0s
=> CACHED [api rust-base 2/3] COPY sharedLibs/html-transformer /app/sharedLibs/html-transformer 0.0s
=> CACHED [api rust-base 3/3] RUN cd /app/sharedLibs/html-transformer && cargo build --release && chmod +x target/release/libhtml_transformer.so 0.0s
=> CACHED [api stage-5 4/5] COPY --from=rust-base /app/sharedLibs/html-transformer/target/release/libhtml_transformer.so /app/sharedLibs/html-transformer/target/release/libhtml_tra 0.0s
=> CACHED [api stage-5 5/5] RUN sed -i 's/\r$//' /app/docker-entrypoint.sh 0.0s
=> [api] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:4df3365deda4dec2340b87f897233ca6f30072cd9bcae0cc46d6416c8a73b970 0.0s
=> => naming to docker.io/library/firecrawl-api 0.0s
=> [api] resolving provenance for metadata file 0.0s
=> [worker internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.17kB 0.0s
=> [worker internal] load .dockerignore 0.0s
=> => transferring context: 77B 0.0s
=> [worker internal] load build context 0.0s
=> => transferring context: 14.74kB 0.0s
=> CACHED [worker base 4/5] COPY . /app 0.0s
=> CACHED [worker base 5/5] WORKDIR /app 0.0s
=> CACHED [worker prod-deps 1/1] RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --prod --frozen-lockfile 0.0s
=> CACHED [worker stage-5 1/5] COPY --from=prod-deps /app/node_modules /app/node_modules 0.0s
=> CACHED [worker build 1/4] RUN --mount=type=cache,id=pnpm,target=/pnpm/store pnpm install --frozen-lockfile 0.0s
=> CACHED [worker build 2/4] RUN apt-get clean && apt-get update -qq && apt-get install -y ca-certificates && update-ca-certificates 0.0s
=> CACHED [worker build 3/4] RUN pnpm install 0.0s
=> CACHED [worker build 4/4] RUN --mount=type=secret,id=SENTRY_AUTH_TOKEN bash -c 'export SENTRY_AUTH_TOKEN="$(cat /run/secrets/SENTRY_AUTH_TOKEN)"; if [ -z $SENTRY_AUTH_TOKEN 0.0s
=> CACHED [worker stage-5 2/5] COPY --from=build /app /app 0.0s
=> CACHED [worker go-base 2/3] COPY sharedLibs/go-html-to-md /app/sharedLibs/go-html-to-md 0.0s
=> CACHED [worker go-base 3/3] RUN cd /app/sharedLibs/go-html-to-md && go mod tidy && go build -o html-to-markdown.so -buildmode=c-shared html-to-markdown.go && chmod + 0.0s
=> CACHED [worker stage-5 3/5] COPY --from=go-base /app/sharedLibs/go-html-to-md/html-to-markdown.so /app/sharedLibs/go-html-to-md/html-to-markdown.so 0.0s
=> CACHED [worker rust-base 2/3] COPY sharedLibs/html-transformer /app/sharedLibs/html-transformer 0.0s
=> CACHED [worker rust-base 3/3] RUN cd /app/sharedLibs/html-transformer && cargo build --release && chmod +x target/release/libhtml_transformer.so 0.0s
=> CACHED [worker stage-5 4/5] COPY --from=rust-base /app/sharedLibs/html-transformer/target/release/libhtml_transformer.so /app/sharedLibs/html-transformer/target/release/libhtml_ 0.0s
=> CACHED [worker stage-5 5/5] RUN sed -i 's/\r$//' /app/docker-entrypoint.sh 0.0s
=> [worker] exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:830a87784d98838b8d312ad0be37eb1e7d0e164846383ba51a6cabb0097082ea 0.0s
=> => naming to docker.io/library/firecrawl-worker 0.0s
=> [worker] resolving provenance for metadata file 0.0s
time="2025-02-13T02:30:48+05:45" level=warning msg="The \"SERPER_API_KEY\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:48+05:45" level=warning msg="The \"OPENAI_BASE_URL\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:48+05:45" level=warning msg="The \"LOGTAIL_KEY\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:48+05:45" level=warning msg="The \"OPENAI_BASE_URL\" variable is not set. Defaulting to a blank string."
time="2025-02-13T02:30:48+05:45" level=warning msg="The \"LOGTAIL_KEY\" variable is not set. Defaulting to a blank string."
[+] Running 4/4
✔ Container firecrawl-redis-1 Created 0.0s
✔ Container firecrawl-playwright-service-1 Created 0.0s
✔ Container firecrawl-worker-1 Recreated 0.1s
✔ Container firecrawl-api-1 Recreated 0.1s
Attaching to api-1, playwright-service-1, redis-1, worker-1
redis-1 | 1:C 12 Feb 2025 20:45:49.592 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis-1 | 1:C 12 Feb 2025 20:45:49.592 * Redis version=7.4.2, bits=64, commit=00000000, modified=0, pid=1, just started
redis-1 | 1:C 12 Feb 2025 20:45:49.592 * Configuration loaded
redis-1 | 1:M 12 Feb 2025 20:45:49.592 * monotonic clock: POSIX clock_gettime
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * Running mode=standalone, port=6379.
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * Server initialized
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * Loading RDB produced by version 7.4.2
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * RDB age 28 seconds
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * RDB memory usage when created 1.51 Mb
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * Done loading RDB, keys loaded: 3, keys expired: 0.
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * DB loaded from disk: 0.000 seconds
redis-1 | 1:M 12 Feb 2025 20:45:49.593 * Ready to accept connections tcp
api-1 | NEW ULIMIT: 65535
api-1 | RUNNING app
worker-1 | NEW ULIMIT: 65535
worker-1 | RUNNING worker
api-1 | 2025-02-12 20:45:50 warn [:]: Authentication is disabled. Supabase client will not be initialized. {}
worker-1 | 2025-02-12 20:45:50 warn [:]: Authentication is disabled. Supabase client will not be initialized. {}
api-1 | 2025-02-12 20:45:51 warn [:]: POSTHOG_API_KEY is not provided - your events will not be logged. Using MockPostHog as a fallback. See posthog.ts for more. {}
worker-1 | 2025-02-12 20:45:51 warn [:]: POSTHOG_API_KEY is not provided - your events will not be logged. Using MockPostHog as a fallback. See posthog.ts for more. {}
api-1 | 2025-02-12 20:45:51 info [:]: Number of CPUs: 16 available
api-1 | 2025-02-12 20:45:51 info [:]: Web scraper queue created
api-1 | 2025-02-12 20:45:51 info [:]: Extraction queue created
api-1 | 2025-02-12 20:45:51 info [:]: Index queue created
api-1 | 2025-02-12 20:45:51 info [:]: Worker 10 started
api-1 | 2025-02-12 20:45:51 info [:]: Worker 10 listening on port 3002
api-1 | 2025-02-12 20:45:51 info [:]: For the Queue UI, open: http://0.0.0.0:3002/admin/@/queues
api-1 | 2025-02-12 20:45:51 info [:]: Connected to Redis Session Rate Limit Store!
playwright-service-1 | [2025-02-12 20:45:51 +0000] [9] [INFO] Running on http://[::]:3000 (CTRL + C to quit)
worker-1 | 2025-02-12 20:45:51 info [:]: Web scraper queue created
worker-1 | 2025-02-12 20:45:51 info [:]: Extraction queue created
worker-1 | 2025-02-12 20:45:51 info [:]: Connected to Redis Session Rate Limit Store!
And also my question is, If I turn USE_DB_AUTHENTICATION=true
and also provide value of these 3 variables SUPABASE_ANON_TOKEN
, SUPABASE_URL
, SUPABASE_SERVICE_TOKEN
and also provide value of TEST_API_KEY=test
, then when calling the api and providing api_key, do I have to provide api_key="fc-test"
or api_key="test"
?
I tried with both, but I was getting not authenticated error in postman while hitting this crawl api http://localhost:3002/v1/crawl
api even after providing accurate values to the required environement variables.
I have shared the api testing ss as well where I am getting unauthorized error.