Search issues

`bbs_database fetch` (~ inverse of `bbs_database add`)

## 🚀 Feature It would be very convenient to be able to "fetch" any article from the database based on its `article_id`. In the background the fetching would 1. Query...

jankrepl

new feature

🗄️ database

Handle <disp-quote> in JATS XMLs correctly

The the comment from #437 and the correponding discussino for details: I think `sc` and `styled-content` should be fine. But for `disp-quote` it's usually long-ish block quotes from patients etc....

Stannislav

`bbs_database parse`: consider only files of interest

## 🚀 Feature Currently, when we call `bbs_database add` on a directory, only `*.pkl` files are considered, to avoid errors in loading other kinds of files, such as auto-generated ones...

FrancescoCasalegno

Missing Blue Brain header in some files pushed in `master`

1

## 🐛 Bug description Some of the files recently pushed to the `master` miss the header, so we should find all files missing it and add the header there. ###...

FrancescoCasalegno

Handle empty paragraphs/fields in PubmedXMLParser

Currently, empty paragraphs/fields are kept by the parser PubmedXMLParser. (see discussion and comments in #406) It could be nice to: [ ] Analyse if some papers have empty paragraphs/fields when...

EmilieDel

Deal with significant spaces in PubmedXMLParser

## 🚀 Feature Currently, we strip every text field we extract during Pubmed XML parsing. See @Stannislav's comment from [#406 (comment)](https://github.com/BlueBrain/Search/pull/406#pullrequestreview-737356877): Dealing with significant spaces. `strip()` might already do a...

EmilieDel

Replace double-slash syntax of PubmedXMLParser

## 🚀 Feature Currently, we use `element.find(".//some/path")` syntax into the `PubmedXMLparser`, [the double-slash is a glob for all elements at all sub-levels](https://docs.python.org/3/library/xml.etree.elementtree.html#xpath-support). If we know the exact (fixed) structure of...

EmilieDel

Training NER models on GPU is non-reproducible

1

As originally found out in https://github.com/BlueBrain/Search/issues/343#issuecomment-830338910, `spaCy` training of models — regardless of the choice of a `transformer` or `tok2vec` backbone — is not reproducible. We also opened an issue...

FrancescoCasalegno

🚫 blocked

Harmonize parsers' constructor

1

## 🚀 Feature `CORD19ArticleParser` and `PubmedXMLParser` are classes inheriting from the abstract class `ArticleParser`. It could be nice: - Harmonise the constructors - Create a constructor in the `ArticleParser` class...

EmilieDel

pip<21.0 interprets "@ http://" as local paths

> I tried with `pip==19.0.3` and got this issue: > > ``` > Requirement 'en-core-sci-lg @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz' looks like a filename, but the file does not exist > Processing ./en-core-sci-lg...

FrancescoCasalegno

🐛 bug fix

Search
Search copied to clipboard

Metadata

`bbs_database fetch` (~ inverse of `bbs_database add`)

Handle <disp-quote> in JATS XMLs correctly

`bbs_database parse`: consider only files of interest

Missing Blue Brain header in some files pushed in `master`

Handle empty paragraphs/fields in PubmedXMLParser

Deal with significant spaces in PubmedXMLParser

Replace double-slash syntax of PubmedXMLParser

Training NER models on GPU is non-reproducible

Harmonize parsers' constructor

pip<21.0 interprets "@ http://" as local paths

← Metadata

Owner

Metadata

Search Search copied to clipboard

Metadata

← Metadata

Owner

Metadata

Search
Search copied to clipboard