metadata-qa-api icon indicating copy to clipboard operation
metadata-qa-api copied to clipboard

Map Avram to MQA Schema

Open nichtich opened this issue 1 year ago • 4 comments

I think we can map an Avram schema of format family flat (e.g. CSV files) to a corresponding MQA Schema. Avram schemas for MARC and PICA would first require addition of these formats and their locator languages (path) to metadata-qa-api.

nichtich avatar Dec 12 '24 15:12 nichtich

@nichtich Sorry, I do not really understand the idea. Could you add an example for the CSV file?

pkiraly avatar Dec 12 '24 16:12 pkiraly

Sample flat record from Avram specification:

{
  "fields": [
    { "tag": "given", "value": "Henriette" },
    { "tag": "given", "value": "Davidson" },
    { "tag": "surname", "value": "Avram" },
    { "tag": "birth", "value": "1919-10-07" }
  ]
}

Same in CSV (with non-standard internal separator '|'):

given,surname,birth
Henriette|Davidson,Avram,1919-10-07

Sample Avram schema

{
  "family": "flat",
  "fields": {
    "given": {
      "label": "given name",
      "required": false,
      "repeatable": true
    },
    "surname": {      
      "required": true,
      "repeatable": true
    },
    "birth": {
      "description": "date of birth in YYYY-MM-DD format",
      "required": false,
      "repeatable": false,
      "pattern": "^[0-9-]+$"
    }
  }
}

Could be an MQA Schema as well.

Hoever flat CSV files are rarely validated at all in practice. Either you get dirty CSV or you get more format data but then its not CSV but JSON, XML or some other format.

nichtich avatar Dec 13 '24 11:12 nichtich

Thanks for the example, now it is clear. The sample Avram schema could be transformed to MQA schema as:

format: CSV
fields:
  - path: given
    name: given name
    rules:
    - minCount: 0
  - path: surname
    name: surname
    rules:
    - minCount: 1
  - path: birth
    name: birth
    description: date of birth in YYYY-MM-DD format
    rules:
    - minCount: 0
    - pattern: ^\\d{4}-\\d{2}-\\d{2}$

In that simple case the translation is straightforward. Do you have a repository that contains full examples of those flat schemas, that I could use as the base inputs for a Avram2MQA "translation" class?

pkiraly avatar Dec 19 '24 16:12 pkiraly

I'm not sure whether this repository is the right code base and it's not urgent, so let's keep Avram2MQA (and possibly other transformations such as from/to Data Package Table Schema) for 2025.

nichtich avatar Dec 20 '24 09:12 nichtich