Search icon indicating copy to clipboard operation
Search copied to clipboard

Blue Brain text mining toolbox for semantic search and structured information extraction

Results 97 Search issues
Sort by recently updated
recently updated
newest added

## 🚀 Feature It would be very convenient to be able to "fetch" any article from the database based on its `article_id`. In the background the fetching would 1. Query...

new feature
🗄️ database

The the comment from #437 and the correponding discussino for details: I think `sc` and `styled-content` should be fine. But for `disp-quote` it's usually long-ish block quotes from patients etc....

## 🚀 Feature Currently, when we call `bbs_database add` on a directory, only `*.pkl` files are considered, to avoid errors in loading other kinds of files, such as auto-generated ones...

## 🐛 Bug description Some of the files recently pushed to the `master` miss the header, so we should find all files missing it and add the header there. ###...

Currently, empty paragraphs/fields are kept by the parser PubmedXMLParser. (see discussion and comments in #406) It could be nice to: [ ] Analyse if some papers have empty paragraphs/fields when...

## 🚀 Feature Currently, we strip every text field we extract during Pubmed XML parsing. See @Stannislav's comment from [#406 (comment)](https://github.com/BlueBrain/Search/pull/406#pullrequestreview-737356877): Dealing with significant spaces. `strip()` might already do a...

## 🚀 Feature Currently, we use `element.find(".//some/path")` syntax into the `PubmedXMLparser`, [the double-slash is a glob for all elements at all sub-levels](https://docs.python.org/3/library/xml.etree.elementtree.html#xpath-support). If we know the exact (fixed) structure of...

As originally found out in https://github.com/BlueBrain/Search/issues/343#issuecomment-830338910, `spaCy` training of models — regardless of the choice of a `transformer` or `tok2vec` backbone — is not reproducible. We also opened an issue...

🚫 blocked

## 🚀 Feature `CORD19ArticleParser` and `PubmedXMLParser` are classes inheriting from the abstract class `ArticleParser`. It could be nice: - Harmonise the constructors - Create a constructor in the `ArticleParser` class...

> I tried with `pip==19.0.3` and got this issue: > > ``` > Requirement 'en-core-sci-lg @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz' looks like a filename, but the file does not exist > Processing ./en-core-sci-lg...

🐛 bug fix