Search issues

Workflow to update setup.py and requirements.txt

2

## 🚀 Feature A workflow to update: 1. minimal and maximal versions in `setup.py`, 2. pinned versions in `requirements.txt`, 3. listed packages in `setup.py`, 4. listed packages in `requirements.txt`, ##...

pafonta

documentation

dependencies

Compare config.cfg from Prodigy for spaCy 3 with spacy init config

9

## Context A `config.cfg` has been created in #274 with the recommended settings (`spacy init config`). The new version of Prodigy will automatically create the `config.cfg` with `prodigy data-to-spacy`. At...

pafonta

wontfix

🔤 named-entity-recognition

Combine "spacy evaluate" with "bluesearch.mining.eval"

1

Currently our evaluation step in the `dvc` pipeline dedicated to NER models relies entirely on the following script, which calls functions from `bluesearch.mining.eval`. https://github.com/BlueBrain/Search/blob/fa1331c98c8823ec85c5b3d92d58e99ab6010574/data_and_models/pipelines/ner/eval.py#L1 It could be convenient to use...

FrancescoCasalegno

🔤 named-entity-recognition

Include filtering of `bad` sentences in creation of the `mining cache`

## 🚀 Feature The `mining_cache` should only mine `good` sentences. ## Motivation Currently: - Every sentences parsed from `json` are kept into the database. There is no quality check. -...

EmilieDel

Move function definitions from "data_and_models/" to "src/"

2

- [ ] Move all the function definitions from `data_and_models/` to `src/` — with the exclusion of any script taken from external repo _as is_ (like [this one](https://github.com/BlueBrain/Search/blob/f0384001c0d6dca164a159187b8d3bc4ebb839bd/data_and_models/pipelines/sentence_embedding/scripts/fine_tune.py#L1)). - [...

FrancescoCasalegno

🗄️ database

Package the NER models

3

## 🚀 Feature Package the NER models we trained. ## Motivation Make the NER models `pip` installable and easily distributable. ## Pitch As we track the models with DVC, we...

pafonta

new feature

🔤 named-entity-recognition

Test DVC pipelines of "data_and_model/" with CI

2

Currently, our CI is never testing the content of `data_and_models/`, so it is possible that e.g. some code changes in `src/` will break `data_and_models/` and we don't realize it. It...

FrancescoCasalegno

🧪 testing

Hyperparameter optimization for NER

1

In #356 we started seeing that we can play with hyperparameters to reduce the runtime while having high accuracy. Once #321 is resolved, we can start looking into hyperparameter optimization:...

FrancescoCasalegno

🔤 named-entity-recognition

Separate source code from code for data & models

1

## 🚀 Feature There are two parts in the BBS repository. * First part. * processing the source data (i.e. CORD-19), * training models (sentence embeddings, NERs), * pre-computing inference...

FrancescoCasalegno

Support incremental population of literature database

1

Currently we are populating our the database of publications all at once. But since in many cases the publications may not all be available at the same time (e.g. we...

FrancescoCasalegno

🗄️ database

Search
Search copied to clipboard

Metadata

Workflow to update setup.py and requirements.txt

Compare config.cfg from Prodigy for spaCy 3 with spacy init config

Combine "spacy evaluate" with "bluesearch.mining.eval"

Include filtering of `bad` sentences in creation of the `mining cache`

Move function definitions from "data_and_models/" to "src/"

Package the NER models

Test DVC pipelines of "data_and_model/" with CI

Hyperparameter optimization for NER

Separate source code from code for data & models

Support incremental population of literature database

← Metadata

Owner

Metadata

Search Search copied to clipboard

Metadata

← Metadata

Owner

Metadata

Search
Search copied to clipboard