bety icon indicating copy to clipboard operation
bety copied to clipboard

priors: make (variable_id, phylogeny, notes) and (citation_id, variable_id, phylogeny) keys

Open gsrohde opened this issue 10 years ago • 1 comments

  • [ ] Eliminate NULLS
  • [ ] Add not NULL constraints
  • [ ] Consolidate or distinguish duplicates
  • [ ] Add uniqueness constraints
  • [ ] Ensure phylogeny column is whitespace-normalized

Eliminate existing NULLS

SELECT * FROM priors WHERE citation_id IS NULL;

yields 77 rows.

SELECT * FROM priors WHERE phylogeny IS NULL;

yields 14 rows.

SELECT * FROM priors WHERE notes IS NULL;

yields 13 rows.

variable_id is never NULL.

Add not NULL constraints

ALTER TABLE priors ALTER COLUMN citation_id SET NOT NULL;
ALTER TABLE priors ALTER COLUMN variable_id SET NOT NULL;
ALTER TABLE priors ALTER COLUMN phylogeny SET NOT NULL;
ALTER TABLE priors ALTER COLUMN notes SET NOT NULL;

Add uniqueness constraints

SELECT ARRAY_AGG(id), variable_id, phylogeny, notes FROM priors GROUP BY variable_id, phylogeny, notes HAVING COUNT(*) > 1;

yields 6 groups of duplicates of size 2 or 3.

SELECT ARRAY_AGG(id), citation_id, variable_id, phylogeny FROM priors GROUP BY variable_id, citation_id, phylogeny HAVING COUNT(*) > 1;

yields 10 groups of duplicates of size 2 or 3.

Uniqueness constraints

ALTER TABLE priors ADD CONSTRAINT unique_note_per_phylogeny_and_variable UNIQUE (variable_id, phylogeny, notes);
ALTER TABLE priors ADD CONSTRAINT unique_phylogeny_and_variable_per_citation UNIQUE (citation_id, variable_id, phylogeny);

COMMENT ON CONSTRAINT "unique_note_per_phylogeny_and_variable" ON "public"."priors" IS 'ensures that if there are two priors for the same group of plants, there will be an explanation of the difference in the notes; alternatively, there will be distinct citations that define the source';
COMMENT ON CONSTRAINT "unique_phylogeny_and_variable_per_citation" ON "public"."priors" IS 'phylogeny-variable is either unique b/c of citation or difference is written into the notes';

Whitespace normalization

Ensuring phylogeny is whitespace-normalized will strengthen the key containing it. We think notes, by its nature, should be allowed to be more free-form.

gsrohde avatar Jan 29 '15 22:01 gsrohde

Bump! This may be related to the "uniqueness" aspect, but I think it should also be the case that each PFT can only have one prior per variable. I think this means putting a UNIQUE (pft_id, priors.variable_id) on the join of the pfts_priors and priors table.

As it is, PEcAn.DB::query.priors will fail with a duplicate rownames error if a PFT has more than one prior for a given variable.

This may be a separate issue -- if so, let me know and I'll open a separate one.

ashiklom avatar Jan 23 '19 18:01 ashiklom