Search
Search copied to clipboard
Blue Brain text mining toolbox for semantic search and structured information extraction
## ๐ Feature A workflow to update: 1. minimal and maximal versions in `setup.py`, 2. pinned versions in `requirements.txt`, 3. listed packages in `setup.py`, 4. listed packages in `requirements.txt`, ##...
## Context A `config.cfg` has been created in #274 with the recommended settings (`spacy init config`). The new version of Prodigy will automatically create the `config.cfg` with `prodigy data-to-spacy`. At...
Currently our evaluation step in the `dvc` pipeline dedicated to NER models relies entirely on the following script, which calls functions from `bluesearch.mining.eval`. https://github.com/BlueBrain/Search/blob/fa1331c98c8823ec85c5b3d92d58e99ab6010574/data_and_models/pipelines/ner/eval.py#L1 It could be convenient to use...
## ๐ Feature The `mining_cache` should only mine `good` sentences. ## Motivation Currently: - Every sentences parsed from `json` are kept into the database. There is no quality check. -...
- [ ] Move all the function definitions from `data_and_models/` to `src/` โ with the exclusion of any script taken from external repo _as is_ (like [this one](https://github.com/BlueBrain/Search/blob/f0384001c0d6dca164a159187b8d3bc4ebb839bd/data_and_models/pipelines/sentence_embedding/scripts/fine_tune.py#L1)). - [...
## ๐ Feature Package the NER models we trained. ## Motivation Make the NER models `pip` installable and easily distributable. ## Pitch As we track the models with DVC, we...
Currently, our CI is never testing the content of `data_and_models/`, so it is possible that e.g. some code changes in `src/` will break `data_and_models/` and we don't realize it. It...
In #356 we started seeing that we can play with hyperparameters to reduce the runtime while having high accuracy. Once #321 is resolved, we can start looking into hyperparameter optimization:...
## ๐ Feature There are two parts in the BBS repository. * First part. * processing the source data (i.e. CORD-19), * training models (sentence embeddings, NERs), * pre-computing inference...
Currently we are populating our the database of publications all at once. But since in many cases the publications may not all be available at the same time (e.g. we...