oltarasenko
oltarasenko
I have stopped active development of CrawlyUI, and don't have enough money to have a standalone server. So will not work on this.
Oh, indeed. So basically it not possible to override middleware because of it :(
I need to think more about the issue. From one side, for now, you should just use https://github.com/oltarasenko/crawly/blob/8c8b3651559529bcb81ec1477ade18386f794f14/lib/crawly/request.ex#L72 to create new requests. I am not quite sure how to address...
Sorry, @jfmlima, I've managed to catch flu. Regarding manual creation of requests: Spider's parse_item/1 function is supposed to return new items and new requests, for example this process is shown...
@jfmlima I am preparing the 0.12.0 rollout just now.
@jfmlima Just done. I have tested it a bit, so it should work fine. Please let me know if something goes wrong, so I will prepare a bugfix.
@spectator Ok, I see. Yes as it was mentioned before we do not override middleware at all on the spider level, that's why your per-spider config is not taken into...
@spectator I think it should, as soon as you're defining them as requests, for example as I did here: https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13?sk=fa66930ce187204285fb43741a414979 See the part where I am doing the login on...
@sreecodeslayer Looks very promising. Please try to sketch a PR. I will be able to help if needed.
From practice: Sometimes it's a pain to find an appropriate set of user agents for a given website :( Just as you have stated, they would render something completely different...