Search
Search copied to clipboard
Blue Brain text mining toolbox for semantic search and structured information extraction
## 🚀 Feature It would be very convenient to be able to "fetch" any article from the database based on its `article_id`. In the background the fetching would 1. Query...
The the comment from #437 and the correponding discussino for details: I think `sc` and `styled-content` should be fine. But for `disp-quote` it's usually long-ish block quotes from patients etc....
## 🚀 Feature Currently, when we call `bbs_database add` on a directory, only `*.pkl` files are considered, to avoid errors in loading other kinds of files, such as auto-generated ones...
## 🐛 Bug description Some of the files recently pushed to the `master` miss the header, so we should find all files missing it and add the header there. ###...
Currently, empty paragraphs/fields are kept by the parser PubmedXMLParser. (see discussion and comments in #406) It could be nice to: [ ] Analyse if some papers have empty paragraphs/fields when...
## 🚀 Feature Currently, we strip every text field we extract during Pubmed XML parsing. See @Stannislav's comment from [#406 (comment)](https://github.com/BlueBrain/Search/pull/406#pullrequestreview-737356877): Dealing with significant spaces. `strip()` might already do a...
## 🚀 Feature Currently, we use `element.find(".//some/path")` syntax into the `PubmedXMLparser`, [the double-slash is a glob for all elements at all sub-levels](https://docs.python.org/3/library/xml.etree.elementtree.html#xpath-support). If we know the exact (fixed) structure of...
As originally found out in https://github.com/BlueBrain/Search/issues/343#issuecomment-830338910, `spaCy` training of models — regardless of the choice of a `transformer` or `tok2vec` backbone — is not reproducible. We also opened an issue...
## 🚀 Feature `CORD19ArticleParser` and `PubmedXMLParser` are classes inheriting from the abstract class `ArticleParser`. It could be nice: - Harmonise the constructors - Create a constructor in the `ArticleParser` class...
> I tried with `pip==19.0.3` and got this issue: > > ``` > Requirement 'en-core-sci-lg @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz' looks like a filename, but the file does not exist > Processing ./en-core-sci-lg...