Guttenberg
Guttenberg copied to clipboard
A bot, searching for plagiarism on Stack Overflow.
Fixed a few typos
Fixed a typo on line 16
Looks like we have another null pointer issue See https://chat.stackoverflow.com/transcript/message/50823685#50823685 for context and a time stamp to correlate with the logs.
The RegEx in the `feedback`-command contains the URL entered in `copypastor_url` in the login properties. This can lead to problems. If it's set to HTTPS, the command won't accept HTTP-links....
In issue https://github.com/SOBotics/Guttenberg/issues/28, I also pointed to our [implementation](https://github.com/sotorrent/string-similarity) of the [Winnowing algorithm](https://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf), which serves a similar purpose. We already [evaluated](http://empirical-software.engineering/assets/pdf/msr18-sotorrent.pdf) how suitable it is for comparing Stack Overflow posts....
We have a 100 Questions/day limit imposed by Google's free api search. Scraping external search engine's is not allowed, but we could ask SE permission for "scraping" SO search results....
From [Cody Gray](https://stackoverflow.com/users/366904): > I still feel like there's a way to do it with Statistically Improbable Phrase matching, which is exactly what I do manually. But maybe it's naïve...
> Moss (for a Measure Of Software Similarity) is an automatic system for determining the similarity of programs. To date, the main application of Moss has been in detecting plagiarism...
The getCodeParagraphs in PostUtils, is a bit faulty in edge cases. In markdown, the four spaces creates a code block, only if there is a new line before the starting...
https://chat.stackoverflow.com/transcript/message/40779108#40779108 Quota was depleted after 7 requests (according to the logs)