inputs value constraints: needed cleanup and decisions
notes
- [x] Add not NULL constraint to migration
start_date, end_date
- [ ] Decide if should be not NULL
- [ ] Decide if there should be other constraints (see below)
- [ ] Fix violations
- [ ] Add constraints to migration
name
- [x] Add not NULL and is_whitespace_normalized constraints to migration
access_level
nullness
- [x] Decide if should be not NULL
- should be not null
If so:
- [x] Fix violations
- change null to 1 (conservatively assuming private)
- [x] Add constraint to migration
range
- [x] Decide whether to require access_level IN (1, 2, 3, 4)
- yes
If so:
- [x] Fix violations
- [x] Make domain and change column type to use it
raw
- [x] Decide if should be not NULL
- should be either 0 or 1; null by default (assuming raw)
If so:
- [x] Fix violations
- ~~convert t --> 1, f --0 (please confirm @robkooper) to be consistent with other boolean fields~~ I left this as a boolean for now. Why change to a type that requires much more storage when boolean yields the same amount of information?
- convert NULL -> 1 (assume raw) -- I CHANGED THESE TO TRUE INSTEAD.
- [x] Add constraint to migration
format_id
- [ ] Get rid of NULLs (2 currently)
- [ ] Add non null constraint to migration
Details
start_date, end_date
SELECT start_date, end_date FROM inputs WHERE start_date IS NULL OR end_date IS NULL;
shows 18 rows with null start_date and 19 with null end_date.
SELECT start_date, end_date FROM inputs WHERE start_date >= end_date;
shows 1 row where start_date > end_date and 2 where they are equal.
CHECK (start_date < end_date) seem like a reasonable constraint (unless we want to attach special significance to them being equal or to end_date being before start_date).
SELECT start_date, end_date FROM inputs WHERE start_date > NOW() OR end_date > NOW();
shows 3 end_date values way in the future. We would think that CHECK (end_date < NOW()) is a reasonable constraint.
access_level
SELECT COUNT(*), access_level FROM inputs GROUP BY access_level ORDER BY access_level;
shows that there are 2 rows with an access_level of -1 and 192 rows where access_level is NULL.
It's not clear what access_level does since BETYdb Rails app users with only acces_level 4 can view and edit all inputs.
- A: access_level defines who should / should not access files, independent of rails app
- @robkooper shouldn't this be handled in dbfiles (or preferably at filesystem level)?
raw
SELECT id, name FROM inputs WHERE raw IS NULL;
shows 8 rows where raw is NULL.
@dlebauer You said the other day to make raw default to false. Looking back over these comments, I see that I fixed violations of raw NOT NULL constraint by changing NULL to TRUE. Should FALSE still be the default? Also, please see pending issues with start_date, end_date, and format_id--if you want to table these issues, I'll leave related constraints out of the migration.
Depends on definition of "raw" and specific datasets. I still think default should be null, and perhaps existing files evaluated on a per file basis
Start date / end date do not apply to some inputs. Perhaps these should be allowed to be NULL