hepcrawl
hepcrawl copied to clipboard
Scrapy project for feeds into INSPIRE-HEP
.. This file is part of hepcrawl. Copyright (C) 2015, 2016, 2017 CERN.
hepcrawl is a free software; you can redistribute it and/or modify it
under the terms of the Revised BSD License; see LICENSE file for
more details.
========== HEPcrawl
.. image:: https://img.shields.io/travis/inspirehep/hepcrawl.svg :target: https://travis-ci.org/inspirehep/hepcrawl
.. image:: https://img.shields.io/coveralls/inspirehep/hepcrawl.svg :target: https://coveralls.io/r/inspirehep/hepcrawl
.. image:: https://img.shields.io/github/tag/inspirehep/hepcrawl.svg :target: https://github.com/inspirehep/hepcrawl/releases
.. image:: https://img.shields.io/pypi/dm/hepcrawl.svg :target: https://pypi.python.org/pypi/hepcrawl
.. image:: https://img.shields.io/github/license/inspirehep/hepcrawl.svg :target: https://github.com/inspirehep/hepcrawl/blob/master/LICENSE
HEPcrawl is a harvesting library based on Scrapy (http://scrapy.org) for INSPIRE-HEP (http://inspirehep.net) that focuses on automatic and semi-automatic retrieval of new content from all the sources the site aggregates. In particular content from major and minor publishers in the field of High-Energy Physics.
The project is currently in early stage of development.
See full documentation at http://pythonhosted.org/hepcrawl