pydoll [Feature]: CLI server mode (pydoll serve)

It would be useful to expose Pydoll as an HTTP service, so external systems can trigger crawls without writing Python code. The idea is to add a CLI command:

pydoll serve --port 8000

This spins up a lightweight server (likely as a plugin, to avoid bloating the core) and exposes a simple API.

Proposed API

Initial endpoint:

POST /crawl → body contains { "url": "https://example.com", "format": "html" | "markdown" }
Response returns the page content, either as HTML or Markdown (depending on the Markdown exporter feature).

This endpoint becomes a foundation for LLM integrations, where the returned HTML or Markdown can be fed into models for structured data extraction. By exposing crawling as a simple web API, Pydoll can be plugged directly into AI pipelines, data labeling flows, or automated extraction systems without extra glue code.

This could start as a separate repository (pydoll-serve) and evolve independently, but integrating a CLI hook into Pydoll keeps the DX simple.

Aug 22 '25 05:08 thalissonvs

As far as I can tell, this feature is a bit more elaborate than that.

The power of pydoll is not just in scraping a single web page, but managing a full context. If you just pull direct web urls (by parsing result HTML pages) you're still not behaving like a human.

I'm suspect that in order to make this worthwhile one will need to interact with a page using the framework, i.e. click on links instead of parsing URLs from HTML. This means an API will need to maintain a session, etc...

Sep 17 '25 08:09 nirizr

Yeah, this is just an initial idea, y'know. I need to think better about it hehe But if you have suggestions, feel free to comment here, it would be really useful

Sep 21 '25 02:09 thalissonvs

Yeah, this is just an initial idea, y'know. I need to think better about it hehe

Of course.

But if you have suggestions, feel free to comment here, it would be really useful

I forked and made a WIP branch, can be seen at nirizr/pydoll/ .

This is untested and very incomplete initial attempt to tackle web service API functionality. I made this to start getting comments before I put too much into it, so feel free to speak your mind :)

If preferred, I can split most functionality to one of the following:

keep as part of this project, as an optional installation flag
Move to a different repository as a plugin
Create a package that depends on pydoll and simply imports it (pydoll-api or something)

It's currently not tested at all so if I were you I wouldn't bother trying it yourself. I will test it in the upcoming days hopefully, planning to actually use it in the near future.

Sep 28 '25 07:09 nirizr

I've tested it a bit and it's currently running (and includes docker compose file for easy setup).

The API structure is simple and doesn't support more complex logic but is a good start I think.

I neglected this a bit because I couldn't get pydoll undetected on the website that I'm interested in scraping...

Nov 11 '25 07:11 nirizr