Ben Kurtovic
Ben Kurtovic
It would be cool to do something like http://bitshift.it/?q=lang:python+func:def:parse_anything, or http://bitshift.it/#q=lang:python+func:def:parse_anything, or whatever will work the best.
``` 14-07-05 19:37:56 DEBUG bitshift.crawler.indexer.GitIndexer Indexing file: kennethreitz/requests: requests/packages/urllib3/packages/six.py 14-07-05 19:37:56 ERROR bitshift.crawler.indexer.GitIndexer Exception raised while parsing: Traceback (most recent call last): File "bitshift/crawler/indexer.py", line 169, in _insert_repository_codelets parse(codelet) File...
Currently we have one full `(BIGINT, BIGINT, VARCHAR(128), VARCHAR(256))` row per author per codelet. Better to have authors' expensive fields stored on their own without any connection to codelets, and...
## Labor division - Ben K. - Database, semantic analysis - Severyn - Frontend, crawler - Ben A. - Frontend, semantic analysis ## Search functionality and parsing ### Checklist -...
I should try to optimize database searching. Will use cProfile. I don't know yet how much work will need to be done on this for it to be reasonable.
This can be used as a fallback when Google's quota gets exceeded. [Suggested here](https://en.wikipedia.org/wiki/Special:Permalink/937287712#Copyvio_detector).
Should be done by passing a general 'engines' parameter that is comma- or pipe-separated. This query param change would also be nice to have in the regular tool itself, but...
Waiting the 12h period (or however long) is not ideal and not necessary when edit feeds exist.
Would be useful when investigating extensive violations by a user where we have a large number of diffs to check. [Link](https://en.wikipedia.org/w/index.php?oldid=782317651#CCI_bot)
- Better presentation of URLs. - Integrate results with checker; should just be _done_ without fussing. Keep things simpler and avoids false positives like [this one](https://en.wikipedia.org/wiki/Special:Permalink/706047971#Turnitin).