Olicorne
Olicorne
Signed-off-by: thiswillbeyourgithub Because the image is not (yet?) in the docker hub repository I [took the liberty of making it build itself](https://github.com/mwmbl/crawler-script/issues/13)
Signed-off-by: thiswillbeyourgithub Hopefully this fixes the [documentation not building](https://app.readthedocs.org/projects/sh/builds/) as reported in #755 Here's the [link to the raw build error log](https://app.readthedocs.org/api/v2/build/28114011.txt)
enh: crawler
- **fix: import HTTPException from ninja to resolve undefined name errors** - **refactor: move crawler configuration to env_vars with environment variable support** - **docs: add initial README for mwmbl crawler**...
I was curious as to using [pybloomfilter3mmap](https://pypi.org/project/pybloomfilter3/) during the crawling was a bad performance design choice compared to [fastbloom-rs](https://pypi.org/project/fastbloom-rs/) (Also for good measure I added [pyprobables](https://pypi.org/project/pyprobables/)) This was done on...
Hi, I notice that pretty often when restarting my crawler the starting up fails and I think the cause is the corruption of the /root/.crawl-index.tinysearch file. The exact error message...
Performance improvements to reap, but needs a bit of fiddling with the dep tree See #269
I see somewhat often a 500 error with the mwmbl API. For example: ``` mwmbl-crawler | 20:INFO:mwmbl.crawl:Indexed, top terms to sync: [('ycombinator com', 143), ('hacker', 138), ('news ycombinator', 138), ('hacker...
- **feat: replace mmap with lmdb without breaking the db** This is the exact same as #269 but with targeted changes to the mwmbl/tinysearchengine/indexer.py to use lmdb instead of mmap....
Python's [black](https://pypi.org/project/black/) package is really improving code readability in my view. It would be nice if the maintainer could run `black **/*py` and add a commit hook for ulterior commits...
Trying to understand TinyIndex raised that question: The pipeline that creates Documents in the TinyIndex. It does not take into account the popularity of a token. So the token could...