Phil Gooch

Results 10 comments of Phil Gooch

I'd echo @a-fent 's comments. This is as much of a PDF parsing problem as it is a reference parsing problem. The way I've approached this in the past was...

I find regex extraction for well-defined identifiers such as arXiv ids and DOI works well, rather than training a model to detect them. For modern arXiv ids, something like `Pattern.compile("(?i)arXiv:\\d{4}[.]\\d{4,5}(v\\d+)?")`...

There's a bunch of word2vec models trained on PubMed data here, and these work well in gensim: - http://evexdb.org/pmresources/vec-space-models/ - https://github.com/cambridgeltl/BioNLP-2016 - http://bioasq.org/news/bioasq-releases-continuous-space-word-vectors-obtained-applying-word2vec-pubmed-abstracts These are all unigram models though iirc

@menshikh-iv The first set of models at http://evexdb.org/pmresources/vec-space-models/are CC-BY (see http://bio.nlplab.org/#license) I'm waiting to hear back from the authors about the license for the other ones, I'll let you know...

@menshikh-iv I just heard back from Billy Chiu who developed the models at https://github.com/cambridgeltl/BioNLP-2016 He's just updated the ReadMe there to confirm that the models at https://drive.google.com/open?id=0BzMCqpcgEJgiUWs0ZnU0NlFTam8 are also made...

I installed it from here: `pip install https://opensource.apple.com/source/python_modules/python_modules-21/bonjour-py/bonjour-py-0.3.tar.gz` You may need to install `swig` first: OS X: `brew install swig` or Linux: `sudo apt-get install swig` If you can a...

I have the same issue. I think that with the Elsevier API you only get titles and abstract, but not the full text, unless you have a ScienceDirect subscription. This...

Hi @gopalkalpande @sid-sundrani thanks for this. This is an open-source project, so if you can submit a pull-request with a fix and accompanying unit test, that will be very welcome.

Hi Fred Thanks for these examples! This is a limitation of the Schwartz-Hearst algorithm itself. This and other limitations were resolved in this paper: https://arxiv.org/pdf/1206.4522.pdf but there is no implementation...

@renaud Thanks for your pull requests. Would you mind combining these into a single PR with the suggested changes? Cheers!