Andy Jackson
Andy Jackson
It’s probably too large to POST to Solr in one big chunk. If you have any problems I should be able to split it with a Python script (and a...
Ah hang on, it’s broken into batches of 100,000. That might work.
In case it helps, if you can run [jq](https://stedolan.github.io/jq/), you can split the single JSON file into JSONLines format so each line is one element of the original array: jq...
On naming (if I'm not too late/irrelevant), FWIW, this is the way I'd describe the usual flow: - From a search, we get multiple _SearchResultsPages_. - Combining these gets a...
~They have a 3.6GB download of the article level metadata, including URLs: https://doaj.org/public-data-dump~ Sorry misread your text.
I think they're set up for you to use the API, e.g. this searchers abstracts for 'n95': ``` curl -X GET --header "Accept: application/json" "https://doaj.org/api/v1/search/articles/bibjson.abstract%3A%22n95%22" ``` See https://doaj.org/api/v1/docs#!/Search/get_api_v1_search_articles_search_query
The DOAJ dump is abstracts only I think. the CORE dump is much larger and includes full text. I'm downloading it but it'll take days (it's 300GB!).
@petermr The results from the EThOS sample are at #36
I'm also thinking of that request that came in looking for evidence of mask effectiveness for typical medical procedures. e.g. should we try to build up the co-occurance matrix for...
I think the link should be to https://pages.semanticscholar.org/coronavirus-research ?