data-models
data-models copied to clipboard
Ability to classify tables/fields within a model
Sometimes I wish I could distinguish between the PEDSnet vocabulary tables and the core tables. Maybe I want a tags column for tables (and fields?), or maybe I want the vocabulary to be a separate data model that can be composed into the PEDSnet model .... Thoughts?
In what context is it difficult to distinguish? Other than knowing which tables are the vocabulary tables (which not everyone does), where would this be useful?
Yes, knowing which tables are the vocab tables. The use case at hand is wanting to automatically denormalize references from the main tables to concept.concept_name via concept.concept_id. This is a hack but involves finding all columns named *_concept_id that are not in vocabulary tables and creating new *_concept_name columns. There probably aren't enough use cases to justify this, but I thought I'd mention it.
This is a hack but involves finding all columns named *_concept_id that are not in vocabulary tables and creating new *_concept_name columns.
In practice, reliable conventions appear to be just as good as constraints :wink: Sarcasm aside, the references file could be used to determine which foreign keys are associated with the vocab tables.
import csv
vocab_tables = {'concept', ...}
matches = []
with open('references.csv') as f:
reader = csv.DictReader(f)
for row in reader:
if row['field'].endswith('_concept_id') and row['ref_table'] not in vocab_tables:
matches.append(row)
for m in matches:
# create the corresponding `_concept_name` column
Thanks; I already wrote the code using the dmsa module; I was just slightly resenting the required magic list of vocab tables ....
magic list of vocab tables ....
That puts things into perspective. I guess this information is no where. Tags or labels could be an interesting piece of the spec. As long as we state they are optional and the semantics are model specific, then it could work.
I support the addition of tags or labels. I agree no attempt should be made to specify the semantics. In other words, it should be entirely up to the data model governance body whether or for what purpose to implement them. In this case, we are the data model governance body.
The addition of the "tags" attribute can be included with the json-table-schema refactor work. I suggest a tag1=val1; tag2=val2 format.