academic-observatory-workflows
academic-observatory-workflows copied to clipboard
Inf 595/update oa schemas
Schema descriptions updated for the aggregate and doi json files.
New files created as new fields have been added to both schemas.
Aggregate schema based off the current author table, as it had the most fields (please let me know if this should be changed!).
Formatting improvements to come in future updates. If the descriptions for the most part make sense, it would be good to get them into the bigquery tables sooner rather than later and we can continue to improve over time.
Schema's have been uploaded to coki-scratch-space.Kathryn.test_agg_schema and coki-scratch-space.Kathryn.test_doi_schema as a test and for ease of viewing.
And- my sincere apologies for the mess I created when creating the branch off a VERY old version of develop!
Codecov Report
All modified lines are covered by tests :white_check_mark:
Comparison is base (
ffa9d4d
) 95.18% compared to head (0059e67
) 95.22%.
Additional details and impacted files
@@ Coverage Diff @@
## develop #164 +/- ##
===========================================
+ Coverage 95.18% 95.22% +0.04%
===========================================
Files 20 20
Lines 5209 5238 +29
Branches 720 727 +7
===========================================
+ Hits 4958 4988 +30
Misses 161 161
+ Partials 90 89 -1
Files | Coverage Δ | |
---|---|---|
...ic_observatory_workflows/workflows/doi_workflow.py | 94.35% <100.00%> (+0.60%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@jdddog do the new schemas necessarily need to have updated dates?
@jdddog do the new schemas necessarily need to have updated dates?
Yeah the schemas don't need dates as dated schemas are only used when backfilling older versions of a dataset.
As requested, I have added a function to create the DOI schema based off definitions instead of pulling it from the doi_
For tables such as Unpaywall, Pumbed and OpenAlex, all of the fields from their respective source tables are brought into the DOI table. Although, for the Crossref Events (events), open_citations, coki and the affiliation part of the DOI table have been separated out into their own schemas and placed in the "intermediate" folder as they all contain calculated fields produced in either the "intermediate_