oltarasenko issues

Results 14 issues of


                                            oltarasenko

Add lightweight UI for Crawly Management

As it was discussed here: https://github.com/oltarasenko/crawly/pull/97#issuecomment-626565242 we want to build a lightweight (probably HTTP based) UI for the single node based Crawly operations. For people who don't want (or don't...

help wanted

Move TestSpider definition from manager_test module

As it's discussed in https://github.com/oltarasenko/crawly/pull/165#discussion_r548901091 it's better to have mocks of a separate module with test spider

General purpose links extractors

One of the problems I am constantly seeing is a need to extract new URLs. And I am looking for a way to simplify it for me and other people...

help wanted

Improve user agents database

A lot of my crawl depends on proper user-agent strings. It's a bit hard to supply user agents using a config as we're doing now. It would be good to...

help wanted

yooli.com: Loginform is not working for a given website

Looks like the problem is that this website does not have form element. Do you think there is a way to handle such cases? Traceback (most recent call last): File...

A new fetcher for Puppeteer based JS rendering

Creates a splash replacement. I have tested it with just one target so far, it's hard to say it's perfect, but it might be an alternative to splash that is...

How remarking is supposed to work with this proposal

Hey, I don't see how can remarketing work if 3rd party cookies are depricated. Could someone explain?

New release

Hey @tsloughter I just wonder if you could make the release? I think I need your last changes of OTP23. Ironically my current project requires epmdless :)

Display correct spider stop reasons

Spiders show send a stop message to UI, in order to avoid displaying 'node_down' which does not reflect the reality

Count estimate seems to produce inaccurate results for small tables

Some example is here: http://crawlyui.com/logs/267/list The count estimate for this query shows 320. The actual amount of rows in the table is 4. As a result, we have long pagination...