wptools icon indicating copy to clipboard operation
wptools copied to clipboard

Offer alternative to pycurl

Open alexzabbey opened this issue 4 years ago • 11 comments

Installing pycurl on windows is a real hassle

alexzabbey avatar Dec 29 '19 07:12 alexzabbey

Thanks for trying wptools @alexzabbey. That's a good idea! Will try to implement, or you're welcome to submit a PR... 😉

siznax avatar Dec 31 '19 18:12 siznax

I found this in request.py:

        # consistently faster than requests by 3x
        #
        # r = requests.get(url,
        #                  headers={'User-Agent': self.user_agent})
        # return r.text

I guess that means you've tried to use requests instead of pycurl but it was slower. Is it that significant? I think helping windows users is more important, and we can probably work to speed things up in requests with sessions and so on or alternatively try doing the requests asynchronously. What do you think?

alexzabbey avatar Jan 01 '20 12:01 alexzabbey

Yes, I started with requests, but couldn't figure out why it was so much more slow than just using curl. Turns out, all of that scaffolding around urllib3 is costly. I found, up to 3x more costly.

I agree that offering requests as an alternative for folks having trouble with pycurl would be good. Worse performance is better than no performance, heh.

siznax avatar Jan 07 '20 22:01 siznax

I found, up to 3x more costly.

That's really noticeable difference! I found a benchmark project which documents some of the python request client https://github.com/svanoort/python-client-benchmarks

lisongx avatar Jan 08 '20 10:01 lisongx

See #44 which isn't linked to a pull request, but likely the initial impetus for changing the library.

ukanuk avatar Apr 12 '20 13:04 ukanuk

+1 this is causing a real headache with docker files (jupyter, kubernetes etc').

uriva avatar May 08 '20 09:05 uriva

Planning to add urllib3 as alternative/replacement for pycurl

siznax avatar Nov 16 '20 23:11 siznax

Yes, I started with requests, but couldn't figure out why it was so much more slow than just using curl. Turns out, all of that scaffolding around urllib3 is costly. I found, up to 3x more costly.

Maybe it is just my machine, but I am finding that wptools as it currently is (using PyCurl) is incredibly slower than trying to manually using requests. Naturally I will use wptools, because all the parsing of wikitext has been a pain in the neck, but my script went from ~20 min long to an overnight run.

Nathan1123 avatar Jan 11 '21 04:01 Nathan1123

I had the same issue. I'm on a windows machine but using conda. You can install libcurl, pycurl with conda and the using pip to install wptools. Maybe adding this would help to guide others.

Simonsoto avatar Jan 14 '21 09:01 Simonsoto

Any update on this milestone?

applieddesign avatar Feb 13 '21 18:02 applieddesign

Hey - did something initial that seems to work: https://github.com/uriva/wptools

uriva avatar Mar 16 '21 17:03 uriva