newspaper icon indicating copy to clipboard operation
newspaper copied to clipboard

Performance optimizations

Open bsolomon1124 opened this issue 5 years ago • 3 comments

Hi @codelucas. I've gotten great use out of newspaper and want to give back with some suggested improvements and touchups.

This PR has commits that are mainly focused on performance optimization, flexibility, and readability, with the performance optimizations centered on calls that usually happen many times in tightly bound loops.

There is a decent amount here in terms of diffs, so feel free to offer pushback or cherry-pick from these. All tests are currently passing and I've added some new ones to boot.

On another note, I want to bump PR #400, which may have gotten lost in the weeds. It is a more scalable, faster, and comprehensive way to incorporate stopwords and would cut down on the number of "please add this language" pull requests. It would take just one call to requests.get('https://raw.githubusercontent.com/stopwords-iso/stopwords-iso/master/stopwords-iso.json').json() to refresh, and PRs to add languages or specific terms could be directed there.

bsolomon1124 avatar Jan 06 '19 22:01 bsolomon1124

Hey @bsolomon1124, thanks so much for this amazing effort! As this is a larger change I'll need time to review but it's awesome you are giving back to the lib 👍 💯

codelucas avatar Jan 18 '19 02:01 codelucas

Has anyone tested this PR? @bsolomon1124 have you been using this pulled into the latest master with success?

banagale avatar Aug 20 '21 20:08 banagale

@banagale admittedly I haven't performance-tested this extensively, nor do I use this library day-to-day as I did on a previous project. If this PR does get actual consideration, it is probably worth a discerning review. (And looking back on it, it could probably be broken into several PRS of more manageable size.)

It looks like the only commits over the past year have been to add various donation links so fair chance this will sit as a PR indefinitely.

bsolomon1124 avatar Aug 22 '21 15:08 bsolomon1124