ontology
ontology copied to clipboard
Restructuring individuals: data format and file format
Description of the issue
As described in https://github.com/OpenEnergyPlatform/ontology/issues/859 most of the individuals in the OEO are lacking a definition. In addition the upper classes are evaluated again.
Ideas of solution
-
data format
: A data format is a data descriptor that describes in which format the data is encoded. (As it is currently implemented._-
file format
: A file format is a data format that describes in which format data is encoded in a file.-
text file format
: A text file format is a file format that is structured as a sequence of lines of electronic text.-
delimiter separated file format
: A delimiter separated file format is a text file format that uses delimiter-separated values (also DSV) to store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. -
comma separated file format
: A comma separated file format is a delimiter separated file format that uses comma (,) as delimiter.-
🔹csv file (csv)
: .tbd
-
-
office open xml
: .tbd OOXML-
🔹microsoft excel workbook (xls)
: .tbd -
🔹microsoft excel workbook (xlsx)
: tbd
-
-
-
binary file format
: A binary file format is a file format that is not a text format. [^1]-
🔹GAMS data exchange format
: A GAMS data exchange format is a binary file format used by General Algebraic Modeling System (GAMS).
-
-
-
database format
: A database format is a data format that describes in which format data is encoded in a database.-
🔹postgresql
: -
🔹mysql
:
-
-
programming/software/? format
: A X format is a data format that describes in which format data is encoded in a programming language.- 🔹dict / series / arrays / constants / pandas dataframe
-
🔹 Individual
Type | Individual | Updated | Definition |
---|---|---|---|
data format | comma-separated values (CSV) | yes | A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Formats that use delimiter-separated values (also DSV) store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. |
data format | microsoft excel workbook (XLSX) | yes | Microsoft excel workbook (XLSX) is a data format and the default file format which holds data in worksheets, charts, and macros. It is the primary extension used by Microsoft's spreadsheet application Excel. |
data format | microsoft excel spreadsheet (XLS) | added | Microsoft excel spreadsheet (XLS) is a data format and file format which holds data in worksheets, charts, and macros. It has been the primary extension used by Microsoft's spreadsheet application Excel. |
data format | extensible markup language (XML) | yes | Extensible markup language (XML) is a data format and markup language for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. |
data format | text file (TXT) | yes | A text file (sometimes spelled textfile) is a kind of computer file that is structured as a sequence of lines of electronic text. |
data format | gams data exchange (GDX) | yes | GAMS data exchange (GDX) is a data format and file format used by General Algebraic Modeling System (GAMS). |
data format | data frame | ||
data format | dict |
l-emele commented yesterday:
When thinking about the data formats, I am asking myself whether we have here more like a subclass hierarchy. Also I think, we have to distinguish between a data format
and a file
. And then something like file 'has data format' some 'data format'
and `' csv file' 'has data format' some 'csv file format'. What about introducing the following subclass structure: (moved up)
The file
classes than can be implemented as equivalent classes, e.g. A character separated value file is a file that has a character separated file format with the axiom: 'comma separated value file' 'Equivalent To' some (file and 'has data format' some 'comma separated file format'
. However, for that we need to define or import a general file
class.
Additionally I suggest csv file
as alternative term to comma separated file
and csv
as alternative term to both comma separated file
and comma separated file format
[^1]: Derived from https://en.wikipedia.org/wiki/Binary_file
Workflow checklist
- [x] I discussed the issue with someone else than me before working on a solution
- [x] I already read the latest version of the workflow for this repository
- [x] The goal of this ontology is clear to me
I am aware that
- [x] every entry in the ontology should have a definition
- [x] classes should arise from concepts rather than from words
The definition of the classes will be discussed here: #1145
Why do we need this issue in parallel to #1145?
The discussion on the classes will be quite long with about 10 new terms. So here we can discuss the definitions of the individuals separately. I don't wont to distort the workflow and I'm not sure if it makes sense like this but it feels better organised for me. Perhaps let's discuss how to handle this in the next dev meeting. And thank you for the feedback!
I analysed the model factsheets and compiled a list of named input and output fields. Most will be suited to be added:
- .mat / .m
- .shp
- .epw (EnergyPlus Weather Data File)
- .json
- .yaml
- .md / .rst
- .dat
- .inc
- netcdf / nc4
- .sqlite / .db
Okay, these are completely new ones, right?
The discussion of the new ones hasn't started yet. I'll postpone the issue until the next release.