reach copied to clipboard
ExtractRefs.who_iris (and others, probably) produce references with titles that are actually long sentences/paragraphs
While working on, it became apparent that the ExtractRefs.who_iris task is producing many references where the reference title is actually just free text from somewhere in the document. A few examples:
[2019-09-13 13:08:31,860] {} INFO - FuzzyMatchRefsOperator: references=203000
[2019-09-13 13:08:31,945] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=998 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1m1 John Grundy 2 AbstrAct modern health-care system and by the 1990s had developed an extensive network of health-care facilities, These developments, in tandem with wider social and economic progress encapsulated in the Gross National Happiness concept, have resulted in major gains in child survival and life expectancy in the past 50 years, In order to sustain these gains, the country has identified a constitutional and health- policy mandate for universal access to health, health data, this\x1b[0m'
[2019-09-13 13:08:32,225] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=1096 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mMeasuring universal health coverage: a three-dimensional composite approach from Bhutan, WHO South-East Asia J Public Health 2014; 3(3-4): 226–237, Royal Government of Bhutan and the World Health Organization Regional Office for South-East Asia, The views presented here are those of the authors and do not, in any way, reflect the official positions of the organizations concept, analysed the data, developed and revised the final manuscript; KZ reviewed and revised the paper; JG undertook an initial\x1b[0m'
[2019-09-13 13:08:32,238] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=586 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1m2010–2011, the paper looks at the inequity of this burden and its changes over time; across ecological zones or belts, development regions, places of residence, or consumption expenditure quintiles; and according to the gender of the head of the household, in nominal terms between 1995–1996 and 2010–2011, The share of OOPS in household consumption expenditure also increased during the same period, primarily as a result of higher health spending by poorer households, Thirteen per cent\x1b[0m'
[2019-09-13 13:08:32,605] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=1140 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mnutritional status, mental health, control of resources/autonomy, workload/time constraints and social support as important caregiver resources for childcare, The aim of this paper is to examine the role of mothers’ caregiving resources in child-care practices in slums, appraise the caregiving practices and health status of children under 5 years, Data were collected from 506 households, selected through multistage stratified random sampling, and data relating to 451 children aged 6–59\x1b[0m'
[2019-09-13 13:08:32,742] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=634 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mtraining and supervision, and information technology needed improvement (>60% but ≤70%), of an optimized and standardized national laboratory network for the detection and reporting of infectious disease that would be compliant with IHR (2005), The participatory strategy employed to adapt an international tool to the Thai context can also serve as a model for use by other countries in the Region, The participatory approach probably ensured better quality and ownership of the\x1b[0m'
[2019-09-13 13:08:32,803] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=522 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mAN, AP participated in the study design and coordination; AT, LFP participated in the study design and coordination and participated in the drafting of the manuscript; XL participated in the conception of the study, and in its design and coordination and drafting of the manuscript; MAR participated in the drafting of the manuscript, The views expressed in this article are the authors’ own and not an official position of the MOPH-Thailand, US Centers for Disease Control\x1b[0m'
[2019-09-13 13:08:33,121] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=1464 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1m3 Phone Myint 3 AbstrAct and Immunization’s Health System Strengthening programme, the Government of Myanmar established a scheme to improve coverage of maternal and child health (MCH) services, Employing qualitative approaches, this article reviews the processes through which this scheme was devised, focusing on evidence generation and the use of such evidence to inform policy formulation, To address the problem of high mortality rates among mothers and infants,\x1b[0m'
[2019-09-13 13:08:33,254] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=1129 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mPrimary field-level data were obtained from 112 public health-care facilities using multistage random sampling, National Sample Survey Organization data and health system data were also analysed, The per capita health expenditure during the pre-reform period was estimated to be `5, 7 and is now close to `50, Availability of essential medicines was encouraging and utilization of public facilities had increased, With additional per capita annual investment of `43, the scheme has\x1b[0m'
[2019-09-13 13:08:33,393] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=1162 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mWHO South-East Asia Journal of Public Health | July–December 2014 | 3 (3–4) World Health Report 2010: Health systems financing - the path to universal coverage World Health Organization Available from: who, int/whr/2010 Monitoring Progress towards Universal Health Coverage at Country and Global Levels: Framework, Measures and Targets Joint WHO/World Bank Group paper, May 2014 World Health Organization and International Bank for Reconstruction and Development/ The World Bank 2014 Available from: who,\x1b[0m'
[2019-09-13 13:08:33,409] {} INFO - ElasticsearchFuzzyMatcher.match: orig-length=562 doc-id=b939df6285870515ec09961a677b20c3 truncated-title='\x1b[1mcity or area or its authorities, or concerning the delimitation of irs’ products does not imply that they are endorsed or recommended by the World Health Organization in preference to others of a similar nature that are not mentioned, All reasonable precautions have been taken by the World Health Organization to\x1b[0m'
[2019-09-13 13:08:33,447] {} INFO - FuzzyMatchRefsOperator: matches=13800
Relevant S3 URLs for debugging include:
- Actual extracted refreences: s3://datalabs-staging/reach-airflow/output/policy/extracted-refs/who_iris/extracted-refs-who_iris.json.gz
- Parsed PDF text is under: s3://datalabs-staging/reach-airflow/output/policy/parsed-pdfs/who_iris/parsed-pdfs-who_iris.json.gz
- Raw PDFs are under: s3://datalabs-staging/reach-airflow/output/policy/spider/who_iris/spider-who_iris
Assigning to @nsorros for triage and creating a ticket in Trello that references this.
Am I right in thinking that we've done some work to address this @ivyleavedtoadflax and therefore this can be closed?
Good question. Yes this will get superseded by the deep_reference_parser once gets merged back in to master, which is blocked by infrastructure issues at present.
So the issue still exists, but it's probably not worth fixing...