http-crawler
http-crawler copied to clipboard
Allow user to choose whether to follow redirects
We currently use requests's default behaviour of following redirects.
A user might not always want this, as they might want to use the library to find unnecessary redirects on a site.
We should find a way to allow the user to configure the behaviour here.
I am a beginner when it comes to contributing to opensource. I am interested in taking this up.
I was thinking crawl() could take keyword arguments (say, follow_redirects) along with base_url through which the user can specify whether to follow redirects or not. Then, we can set allow_redirects in get() method, accordingly.
Please let me know if my approach is a proper one.
Hi @rkrp. That looks like a sensible approach.
One question: what should the default behaviour be? I think that following redirects should be the default, as this is what people who use requests will expect. What do you think?
Do you fancy trying to implement this?
@inglesp I agree. The default behaviour must be to follow redirects. Anything otherwise, would be very counter-intuitive.
Also, I presume, I will need to write the corresponding unit tests for this new feature.
I would love to implement this. But, I am a student and I am in the middle of my exams. So, it would take sometime before I can send a pull request. I hope that is fine.
Yes, when you come to implement this, please include tests.
Good luck with your exams!
Thanks.
I was initially planning to create a new method test_redirect() in test_http_crawler.py to include the tests related to this. But, the code for starting the local server (serve())is defined in test_crawl(). So, should I write the tests for redirects in test_crawl() or should I move _serve() out of test_crawl()?
Hi @rkrp!
I'm one step ahead! Last night, I added a commit that adds a new option to crawl(), and in the process, refactored the test code to move _serve() out of test_crawl(). Does this solve the problem for you?
Hi @rkrp -- how's this going? Anything I can help with?
@inglesp I am looking for the best way to setup local httpd for the redirection tests. I looked into the tests present already. And also, I am reading up on the documentations for http.server. Hopefully, I will be able to send a PR by this weekend.
@rkrp -- good stuff!
@rkrp, any progress? If there's anything I can help with, let me know. If not, I'd like to pass this issue on to somebody else to tackle.
@inglesp I am sorry for the delay. I have sent a PR, implementing this. #8