Search
Search copied to clipboard
Blue Brain text mining toolbox for semantic search and structured information extraction
## Scope If we need to implement a NER system supporting mining of `N` different entity types, we can do so in different way. In particular, if `N>>1` different strategies...
All the features built so far support parameter extraction from unstructured natural language. This allows us to efficiently extract structured information from the text paragraphs of a paper, but in...
Until now the runtimes of both "Search" and "Mine" functionalities have been acceptable. But the code was tested only up to ~100,000 full-text papers (size of CORD-19 v65). As we...
The goal of this ticket is to create capabilities to download large numbers of neuroscientific papers. Ideally these papers should be in a machine readable format like text, json, html,...
## Scope When we train a NER model, we can choose various pre-trained base models to fine-tune them on the NER task. For instance, we can choose any of `scispacy`'s...
- [ ] We could make the `benchmarks/` more useful if the server urls are specified via the `.env`. - [ ] Additionally, we should check if those tests are...
When pre-training/fine-tuning a model on a Masked Language Model task, by default the `huggingface/transformers` library: - uses the loss on the evaluation dataset to select the best model, - considers...
## Context * Currently (see BBS-125) we are just assuming that all {{dvc run}} etc. commands are run from within a container that is running the image built from {{Dockerfile-dvc}}...
## 🐛 Bug description The entry point `compute_embeddings` handles more models than the search server. This means that some embeddings could be computed but aren't usable. Indeed, the search server...
## 🚀 Feature **Note:** This issue is related to #281 , which would benefit from this issue being addressed. Before being able to use the database server, one needs to...