enforce naming conventions for tables and columns
see also https://github.com/OpenEnergyPlatform/oem2orm/issues/16
We once started to collect naming conventions here: https://github.com/OpenEnergyPlatform/data-preprocessing/wiki#naming-conventions. Perhaps it would be worth keeping a list of the checks that have been implemented in the OEP API?
both tables and column names now must adhere to the regexp ^[a-z][a-z0-9_]{0,MAX_IDENTIFIER_LENGTH-1}$, MAX_IDENTIFIER_LENGTH currently being 50.
The length limit is a little shorter than the postgres restrictions (63 for tables, 59 for columns), because of the versioned tables that add suffices (like _edit) to the table names
both tables and column names now must adhere to the regexp ^[a-z][a-z0-9_]{0,MAX_IDENTIFIER_LENGTH-1}$, MAX_IDENTIFIER_LENGTH currently being 50.
@wingechr is there a reason to put a limit of 50 chars on the resources-name and column headers? I already ran into problems regarding this and had to rename tables. Thus, I'd vote to increase the the limit or to omit it.
This is a technical limitation due to the Postgresql database - if there is no better solution than this, unfortunately we cannot change this.
If there is no better solution, I'd suggest we add a section on the OEP nearby the upload Wizzard that highlights all the postgresql naming conventions. Knowing them in advance and taking them into account is more resource efficient than correcting them later.
Then let's make a list in this or a new issue and collect all things there. The list should represent the conventions in a readable form. Then we can add them to the mockup Bryan created.
Oki I'll start here, as the issue title seems appropriate
Very nice, thanks :)
Do's and Don't apply for
- tables names
- table headers
Do's
- use ASCII characters only
- use lower case only
- use underscores
Don'ts
- table name must not exceed maximal character limit = 50
- no points
- no commas
- no spaces
- no special characters
- avoid dates
- if dates are used, then without
-! E.g.: 2022-10-21 will through an error. Use underscore instead! - don't start parameter name with a number
Within a table column
Are emtpy rows allowed??
EDIT: Please, add, edit and correct points if necessary here
for table names, this is already implemented actions.py assert_valid_table_name(table):
...
if not re.match(r"[a-z][a-z0-9_]*", table):
raise APIError
but not for column names.
columns names must probably checked in 2 places: create and alter (although I don't know why we even support renaming of columns)
columns names must probably checked in 2 places: create and alter (although I don't know why we even support renaming of columns)
I think this is still from the old revise and edit feature. Maybe we should remove the column renaming, but I like the flexibility. In the meantime I also think that all extra features should be removed/disabled for now, until the oep core is running stable.
Do's and Don't apply for
1. tables names 2. table headersDo's
* use ASCII characters only * use lower case only * use underscoresDon'ts
* table name must not exceed maximal character limit = 50 * no points * no commas * no spaces * no special characters * avoid dates * if dates are used, then without `-`! E.g.: 2022-10-21 will through an error. Use underscore instead! * don't start parameter name with a number
I think this very quick overview is something we may want to use in the capacity building section. Even if these do's and don'ts may already be part of some of the tutorials, I think this simple list is really helpful. I may try to add a short tutorial on this which we can then tie into a capacity building module at one point. Can I ask for your review once the tutorial is done? @wingechr
Yes, that would be great. Then we also finally have something that is synchronized with what is actually happening in the OEP. We can include the link to the CBM in the upload process and we have to show a readable error message for these errors in the OEP.