wptools
wptools copied to clipboard
Offer alternative to pycurl
Installing pycurl on windows is a real hassle
Thanks for trying wptools @alexzabbey. That's a good idea! Will try to implement, or you're welcome to submit a PR... 😉
I found this in request.py:
# consistently faster than requests by 3x
#
# r = requests.get(url,
# headers={'User-Agent': self.user_agent})
# return r.text
I guess that means you've tried to use requests instead of pycurl but it was slower. Is it that significant? I think helping windows users is more important, and we can probably work to speed things up in requests with sessions and so on or alternatively try doing the requests asynchronously. What do you think?
Yes, I started with requests
, but couldn't figure out why it was so much more slow than just using curl
. Turns out, all of that scaffolding around urllib3
is costly. I found, up to 3x more costly.
I agree that offering requests
as an alternative for folks having trouble with pycurl
would be good. Worse performance is better than no performance, heh.
I found, up to 3x more costly.
That's really noticeable difference! I found a benchmark project which documents some of the python request client https://github.com/svanoort/python-client-benchmarks
See #44 which isn't linked to a pull request, but likely the initial impetus for changing the library.
+1 this is causing a real headache with docker files (jupyter, kubernetes etc').
Planning to add urllib3
as alternative/replacement for pycurl
Yes, I started with requests, but couldn't figure out why it was so much more slow than just using curl. Turns out, all of that scaffolding around urllib3 is costly. I found, up to 3x more costly.
Maybe it is just my machine, but I am finding that wptools as it currently is (using PyCurl) is incredibly slower than trying to manually using requests. Naturally I will use wptools, because all the parsing of wikitext has been a pain in the neck, but my script went from ~20 min long to an overnight run.
I had the same issue. I'm on a windows machine but using conda. You can install libcurl, pycurl with conda and the using pip to install wptools. Maybe adding this would help to guide others.
Any update on this milestone?
Hey - did something initial that seems to work: https://github.com/uriva/wptools