fink icon indicating copy to clipboard operation
fink copied to clipboard

Add option to dump response content to a folder

Open alexander-schranz opened this issue 5 years ago • 6 comments

I want to use fink not only to check for false response codes. I want also to dump its response content (in my case html) to a file because I want then to use the w3c html5validator to validate this files.

Before investigating into implementing this I would to check if you are open to add this option.

alexander-schranz avatar Feb 19 '19 14:02 alexander-schranz

maybe. My main concern would be that while dumping the HTML is easy enough, it takes you half-way to creating an offline version of a given site (dumping assets etc, then relatiivising URLs), and that's a new problem.

I think Fink could also be used as a library, and you could f.e.do

$dispatcher = DispatcherBuilder::create('http://www.example.com')->callback(function (Response $response) {
    // your stuff here
})->build();

but you probably want to use this via. the command, so that would require more refactoring (and itherwise you would currently have to bootstrap the event loop stuff).

dantleech avatar Feb 19 '19 21:02 dantleech

Note that there is also https://github.com/spatie/crawler which might be more suitable for your use case? looking at the Validator library I guess it makes sense to have a tool which just dumps the HTML.

Not against the idea necessarily, it would be convenient, but actually not 100% sure it belongs here (it could fit though)

dantleech avatar Feb 19 '19 21:02 dantleech

If we did, I guess the Crawler should be refactored to extract the DOM parsing into an observer, the code to dump the HTML can then also be an observer, and the Crawler will only send notifications when it gets a Response and has read the $body.

dantleech avatar Feb 19 '19 21:02 dantleech

If we did, I guess the Crawler should be refactored to extract the DOM parsing into an observer, the code to dump the HTML can then also be an observer, and the Crawler will only send notifications when it gets a Response and has read the $body.

Throwing an Event sounds like a good idea for me and would make it very flexible: Would you use the symfony/event-dispatcher for this or which library do you prefer?

alexander-schranz avatar Feb 19 '19 22:02 alexander-schranz

No, no libraries for this :) We can simply create an interface for the observer (e.g. CrawlerObserver) and pass a collection of these to the crawler (e.g. CrawlerObservers).

dantleech avatar Feb 20 '19 09:02 dantleech

I think this will require some refactoring, still thinking about how to do it...

dantleech avatar Feb 23 '19 14:02 dantleech