Search icon indicating copy to clipboard operation
Search copied to clipboard

Blue Brain text mining toolbox for semantic search and structured information extraction

Results 97 Search issues
Sort by recently updated
recently updated
newest added

## ๐Ÿš€ Feature A workflow to update: 1. minimal and maximal versions in `setup.py`, 2. pinned versions in `requirements.txt`, 3. listed packages in `setup.py`, 4. listed packages in `requirements.txt`, ##...

documentation
dependencies

## Context A `config.cfg` has been created in #274 with the recommended settings (`spacy init config`). The new version of Prodigy will automatically create the `config.cfg` with `prodigy data-to-spacy`. At...

wontfix
๐Ÿ”ค named-entity-recognition

Currently our evaluation step in the `dvc` pipeline dedicated to NER models relies entirely on the following script, which calls functions from `bluesearch.mining.eval`. https://github.com/BlueBrain/Search/blob/fa1331c98c8823ec85c5b3d92d58e99ab6010574/data_and_models/pipelines/ner/eval.py#L1 It could be convenient to use...

๐Ÿ”ค named-entity-recognition

## ๐Ÿš€ Feature The `mining_cache` should only mine `good` sentences. ## Motivation Currently: - Every sentences parsed from `json` are kept into the database. There is no quality check. -...

- [ ] Move all the function definitions from `data_and_models/` to `src/` โ€” with the exclusion of any script taken from external repo _as is_ (like [this one](https://github.com/BlueBrain/Search/blob/f0384001c0d6dca164a159187b8d3bc4ebb839bd/data_and_models/pipelines/sentence_embedding/scripts/fine_tune.py#L1)). - [...

๐Ÿ—„๏ธ database

## ๐Ÿš€ Feature Package the NER models we trained. ## Motivation Make the NER models `pip` installable and easily distributable. ## Pitch As we track the models with DVC, we...

new feature
๐Ÿ”ค named-entity-recognition

Currently, our CI is never testing the content of `data_and_models/`, so it is possible that e.g. some code changes in `src/` will break `data_and_models/` and we don't realize it. It...

๐Ÿงช testing

In #356 we started seeing that we can play with hyperparameters to reduce the runtime while having high accuracy. Once #321 is resolved, we can start looking into hyperparameter optimization:...

๐Ÿ”ค named-entity-recognition

## ๐Ÿš€ Feature There are two parts in the BBS repository. * First part. * processing the source data (i.e. CORD-19), * training models (sentence embeddings, NERs), * pre-computing inference...

Currently we are populating our the database of publications all at once. But since in many cases the publications may not all be available at the same time (e.g. we...

๐Ÿ—„๏ธ database