Phil Gooch comments

Results 11 comments of


                                            Phil Gooch

Footnotes

I'd echo @a-fent 's comments. This is as much of a PDF parsing problem as it is a reference parsing problem. The way I've approached this in the past was...

arXiv identifiers not extracted

I find regex extraction for well-defined identifiers such as arXiv ids and DOI works well, rather than training a model to detect them. For modern arXiv ids, something like `Pattern.compile("(?i)arXiv:\\d{4}[.]\\d{4,5}(v\\d+)?")`...

Add medical corpora + pretrained models

There's a bunch of word2vec models trained on PubMed data here, and these work well in gensim: - http://evexdb.org/pmresources/vec-space-models/ - https://github.com/cambridgeltl/BioNLP-2016 - http://bioasq.org/news/bioasq-releases-continuous-space-word-vectors-obtained-applying-word2vec-pubmed-abstracts These are all unigram models though iirc

Add medical corpora + pretrained models

@menshikh-iv The first set of models at http://evexdb.org/pmresources/vec-space-models/are CC-BY (see http://bio.nlplab.org/#license) I'm waiting to hear back from the authors about the license for the other ones, I'll let you know...

Add medical corpora + pretrained models

@menshikh-iv I just heard back from Billy Chiu who developed the models at https://github.com/cambridgeltl/BioNLP-2016 He's just updated the ReadMe there to confirm that the models at https://drive.google.com/open?id=0BzMCqpcgEJgiUWs0ZnU0NlFTam8 are also made...

Issue with pip install -r requirements.txt

I installed it from here: `pip install https://opensource.apple.com/source/python_modules/python_modules-21/bonjour-py/bonjour-py-0.3.tar.gz` You may need to install `swig` first: OS X: `brew install swig` or Linux: `sudo apt-get install swig` If you can a...

Phil Gooch

Footnotes

arXiv identifiers not extracted

Add medical corpora + pretrained models

Add medical corpora + pretrained models

Add medical corpora + pretrained models

Issue with pip install -r requirements.txt

Papers_With_Section_Titles 404

Ability to customize short-form/long-form detction of S-H algorithm

a few false positives

Update schwartz_hearst.py