ontology
ontology copied to clipboard
Add subclasses of `data format` and add `file`
Description of the issue
Issue #859 has shown that subclasses of data format
are needed. Also a class file
should be added that we can clearly distinguish between a data format and a file.
Ideas of solution
If you already have ideas for the solution describe them here
Workflow checklist
- [ ] I discussed the issue with someone else than me before working on a solution
- [ ] I already read the latest version of the workflow for this repository
- [ ] The goal of this ontology is clear to me
I am aware that
- [ ] every entry in the ontology should have a definition
- [ ] classes should arise from concepts rather than from words
When thinking about the data formats, I am asking myself whether we have here more like a subclass hierarchy. Also I think, we have to distinguish between a data format
and a file
. And then something like file 'has data format' some 'data format'
and 'csv file' 'has data format' some 'csv file format'
. What about introducing the following subclass structure:
-
data format
: A data format is a data descriptor that describes in which format the data is encoded. (As it is currently implemented._-
file format
: A file format is a data format that describes in which format data is encoded in a file.-
text file format
: A text file format is a file format that is structured as a sequence of lines of electronic text.-
delimiter separated file format
: A delimiter separated file format is a text file format that uses delimiter-separated values (also DSV) to store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. -
comma separated file format
: A comma separated file format is a delimiter separated file format that uses comma (,) as delimiter.
-
-
binary file format
: A binary file format is a file format that is not a text format. [^1]-
GAMS data exchange format
: A GAMS data exchange format is a binary file format used by General Algebraic Modeling System (GAMS).
-
-
microsoft excel workbook (xls)
: .tbd -
microsoft excel workbook (xlsx)
: tbd
-
-
The file
classes than can be implemented as equivalent classes, e.g. A character separated value file is a file that has a character separated file format with the axiom: 'comma separated value file' 'Equivalent To' some (file and 'has data format' some 'comma separated file format'
. However, for that we need to define or import a general file
class.
Additionally I suggest csv file
as alternative term to comma separated file
and csv
as alternative term to both comma separated file
and comma separated file format
[^1]: Derived from https://en.wikipedia.org/wiki/Binary_file
Originally posted by @l-emele in https://github.com/OpenEnergyPlatform/ontology/issues/859#issuecomment-1123328815
I like your proposed structure.
I'm not sure how to extend this for other file formats like scripts (.py) or images (.png).
Or do they count as data
.
I think this part could be found in some other domain ontology. @OpenEnergyPlatform/oeo-general-expert-formal-ontology
I think it is useful to add a further file format:
-
source code file format
: A source code file format is a text file format that source code in a programming language.
OEO dev meeting 41: Implementing
change data format
: A data format is a data descriptor that specifies the structure in which the data item is encoded.
add file format
: A file format is a data format that describes how information is stuctured and encoded in a file.
add file encoding
: A file format is a data format that describes how information is stuctured and encoded in a file.
Update to the earlier post by @stap-m based on the "how to implement"-session during the 41th OEO-meeting.
add data file format
: A file format is a data format that describes how data is stuctured in a file.
add file encoding
: We identified that encoding will be necessary. We did not have time to come up with a definition
Suggestion for file encoding
: A file encoding is a data format, that specifies the characterset used to encode the data in a file.
And a suggestion for the structure in order to sort in most of the individuals listed in #1149
-
data format
-
data file format
-
text file format
-
comma seperated file format
-
delimiter separated file format
-
binary file format
-
structured file format
: Individuals: xml, json,
-
-
data file encoding
-
in-memory data format
: Individual: data frame -
data base format
-
This issue seems close to implementation. Summary:
Agreement seems to be reached about:
-
data format
A data format is a data descriptor that specifies the structure in which the data item is encoded.-
data file format
A file format is a data format that describes how data is stuctured in a file.-
text file format
A text file format is a file format that is structured as a sequence of lines of electronic text. -
comma seperated file format
A comma separated file format is a delimiter separated file format that uses comma (,) as delimiter. -
delimiter separated file format
A delimiter separated file format is a text file format that uses delimiter-separated values (also DSV) to store two-dimensional arrays of data by separating the values in each row with specific delimiter characters. -
binary file format
A binary file format is a file format that is not a text format. 1
-
-
Still open:
-
structured file format
definition missing : Individuals: xml, json, -
data file encoding
vsfile encoding
: A file encoding is a data format, that specifies the characterset used to encode the data in a file. -
in-memory data format
definition missing : Individual: data frame -
data base format
definition missing
Is there a reason why CSV and DSV are not a subclass of text file?
Is there a reason why CSV and DSV are not a subclass of text file?
DSV is a text file format (A delimiter separated file format is a text file format that uses delimiter-separated values (also DSV) to store two-dimensional arrays of data by separating the values in each row with specific delimiter characters.) and CSV is a DSV (A comma separated file format is a delimiter separated file format that uses comma (,) as delimiter.) You were probably mislead as in @chrwm the indentations did not reflect this correctly.
source code file format
: A source code file format is a text file format that source code in a programming language.
As no one objected against this proposal for months, I interpret that also as an agreement.
I suggest to implement what has already agreed upon and discuss the remaining parts after.
From last dev meeting we decided I am going to implement the already agreed terms.
Should I make #1326 close this issue so we can push the remaining terms to the next release?
If not everything is solved with PR #1326, then please leave this issue open and move the milestone. Also a list of open points after the PR would be nice.
Here is a list of classes that are not yet implemented:
-
GAMS data exchange format
: A GAMS data exchange format is a binary file format used by General Algebraic Modeling System (GAMS). -
microsoft excel workbook (xls)
-
microsoft excel workbook (xlsx)
-
data file encoding
-
in-memory data format
-
data base format
-
structured file format
with subclassesxml
andjson
-
file encoding
: A file encoding is a data format, that specifies the characterset used to encode the data in a file. -
data file encoding
-
in-memory data format
with subclassdata frame
-
data base format
I just realised that in #1326, classes were added, but the respective individuals were not deleted. This is the current list of data format
individuals:
Instead of the individual csv
(OEO:00000116) we have now the class comma seperated file format
(OEO:00280003) and instead of the individual txt
(OEO:00000426) we have now the class text file format
(OEO:00280001). We should delete these two individuals. Further, we should think about whether the classes comma seperated file format
and text file format
and get the IDs and the IRIs of the old individuals. The advantage would be, that users of the OEO get with the same IRI the refined concept.
Implementation of that would go in two steps:
- Delete the individuals OEO:00000116 and OEO:00000426 using Protégé.
- Replacing all instances of OEO:00000116 with OEO:00280003 and OEO:00000426 and OEO:00280001 in all ontology files using an text editor.
This is an decision we should ideally make before the next release. Therefore I move this issue back to milestone 1.12.0. @OpenEnergyPlatform/oeo-release-team : Any opinions here?
I agree. The individuals are not used anywhere in the OEO currently. I'll delete the individuals.
I'm fine not recycling the old IRI's and ID's. New OEO versions often lack backward compatibility that the benefit of not breaking the two concepts appears disproportionate. I personally prefer the new ID's more and hence argue not to re-use the IDs and IRIs.
From the dev meeting: According to OBO best-practices classes/individuals are declared obsolete. This should have been done instead of deleting it. We didn't know what the decision in the OEO regarding this is, hence we postpone the issue to the next release milestone.
From the dev meeting: According to OBO best-practices classes/individuals are declared obsolete. This should have been done instead of deleting it. We didn't know what the decision in the OEO regarding this is, hence we postpone the issue to the next release milestone.
Thanks, I wasn't aware of it, since it wasn't commented here. At which dev meeting did we talk about that (number)?
Sorry I meant the release meeting today! We just talked about it before starting the release.
This issue has not any progress in the last to months and its therefore unlikely that we solve it before the next release. I thus move the milestone.
I transferred everything what is left from this issue to a new issue #1518 and thus close this issue here. If I missed something feel free to re-open this issue or open a further issue.