colrev icon indicating copy to clipboard operation
colrev copied to clipboard

feat: SSRN SearchSource

Open geritwagner opened this issue 2 years ago • 0 comments
trafficstars

Description

Create colrev.ssrn as a new CoLRev package to interface with the SSRN network. SSRN provides a web-based search interface with access to working papers. A new SearchSource and scraper should be implemented based on the BeautifulSoup library to retrieve the search results for a given query.

Implementation Notes

  • Use the BeautifulSoup library to develop a scraper that navigates the SSRN search interface, fetching search results along with full-text links (if available). The scraper should be capable of running a search based on a given query, and export the search results. This may require navigating pages of search results.
  • Documentation should guide users on how to perform searches and load search results.
  • Implement unit tests simulating the retrieval process with a simple example. The colrev.crossref package provides an example for unit tests.
  • The packages docs page explains the steps of developing CoLRev packages.
  • The search-feed offers functionality for storing records.
  • CEP003 describes principles for SearchSources.
  • CEP002 describes the standard data schema for records.
  • The colrev.crossref package implements similar unit tests.

User Story

  1. The user initializes a CoLRev project using colrev init.
  2. He/she conducts a search operation with colrev search -a colrev.ssrn, specifying search parameters relevant to their topic.
  3. The package retrieves records based on the parameters and saves them in the data/search directory of the project.
  4. When users run colrev load, the records from the search directory are added to the data/records.bib file, which integrates all search results in the project.
  5. When users run colrev search again, records are retrieved from the API, updating the existing records.

Useful Links

Expected Effort Required

2 months, 3-4 people.

geritwagner avatar Sep 20 '23 19:09 geritwagner