ontology icon indicating copy to clipboard operation
ontology copied to clipboard

Restructuring individuals: data format and file format

Open Ludee opened this issue 2 years ago • 7 comments

Description of the issue

As described in https://github.com/OpenEnergyPlatform/ontology/issues/859 most of the individuals in the OEO are lacking a definition. In addition the upper classes are evaluated again.

Ideas of solution

  • data format: A data format is a data descriptor that describes in which format the data is encoded. (As it is currently implemented._
    • file format: A file format is a data format that describes in which format data is encoded in a file.

      • text file format: A text file format is a file format that is structured as a sequence of lines of electronic text.
        • delimiter separated file format: A delimiter separated file format is a text file format that uses delimiter-separated values (also DSV) to store two-dimensional arrays of data by separating the values in each row with specific delimiter characters.
        • comma separated file format: A comma separated file format is a delimiter separated file format that uses comma (,) as delimiter.
          • 🔹csv file (csv): .tbd
        • office open xml: .tbd OOXML
          • 🔹microsoft excel workbook (xls): .tbd
          • 🔹microsoft excel workbook (xlsx): tbd
      • binary file format: A binary file format is a file format that is not a text format. [^1]
        • 🔹GAMS data exchange format: A GAMS data exchange format is a binary file format used by General Algebraic Modeling System (GAMS).
    • database format: A database format is a data format that describes in which format data is encoded in a database.

      • 🔹postgresql:
      • 🔹mysql:
    • programming/software/? format: A X format is a data format that describes in which format data is encoded in a programming language.

      • 🔹dict / series / arrays / constants / pandas dataframe

🔹 Individual

Type Individual Updated Definition
data format comma-separated values (CSV) yes A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Formats that use delimiter-separated values (also DSV) store two-dimensional arrays of data by separating the values in each row with specific delimiter characters.
data format microsoft excel workbook (XLSX) yes Microsoft excel workbook (XLSX) is a data format and the default file format which holds data in worksheets, charts, and macros. It is the primary extension used by Microsoft's spreadsheet application Excel.
data format microsoft excel spreadsheet (XLS) added Microsoft excel spreadsheet (XLS) is a data format and file format which holds data in worksheets, charts, and macros. It has been the primary extension used by Microsoft's spreadsheet application Excel.
data format extensible markup language (XML) yes Extensible markup language (XML) is a data format and markup language for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.
data format text file (TXT) yes A text file (sometimes spelled textfile) is a kind of computer file that is structured as a sequence of lines of electronic text.
data format gams data exchange (GDX) yes GAMS data exchange (GDX) is a data format and file format used by General Algebraic Modeling System (GAMS).
data format data frame
data format dict

l-emele commented yesterday: When thinking about the data formats, I am asking myself whether we have here more like a subclass hierarchy. Also I think, we have to distinguish between a data format and a file. And then something like file 'has data format' some 'data format' and `' csv file' 'has data format' some 'csv file format'. What about introducing the following subclass structure: (moved up)

The file classes than can be implemented as equivalent classes, e.g. A character separated value file is a file that has a character separated file format with the axiom: 'comma separated value file' 'Equivalent To' some (file and 'has data format' some 'comma separated file format'. However, for that we need to define or import a general file class. Additionally I suggest csv file as alternative term to comma separated file and csv as alternative term to both comma separated file and comma separated file format [^1]: Derived from https://en.wikipedia.org/wiki/Binary_file

Workflow checklist

  • [x] I discussed the issue with someone else than me before working on a solution
  • [x] I already read the latest version of the workflow for this repository
  • [x] The goal of this ontology is clear to me

I am aware that

  • [x] every entry in the ontology should have a definition
  • [x] classes should arise from concepts rather than from words

Ludee avatar May 12 '22 10:05 Ludee

The definition of the classes will be discussed here: #1145

Ludee avatar May 12 '22 11:05 Ludee

Why do we need this issue in parallel to #1145?

l-emele avatar May 12 '22 11:05 l-emele

The discussion on the classes will be quite long with about 10 new terms. So here we can discuss the definitions of the individuals separately. I don't wont to distort the workflow and I'm not sure if it makes sense like this but it feels better organised for me. Perhaps let's discuss how to handle this in the next dev meeting. And thank you for the feedback!

Ludee avatar May 12 '22 18:05 Ludee

I analysed the model factsheets and compiled a list of named input and output fields. Most will be suited to be added:

  • .mat / .m
  • .shp
  • .epw (EnergyPlus Weather Data File)
  • .json
  • .yaml
  • .md / .rst
  • .dat
  • .inc
  • netcdf / nc4
  • .sqlite / .db

Ludee avatar May 12 '22 18:05 Ludee

Okay, these are completely new ones, right?

l-emele avatar May 13 '22 07:05 l-emele

The discussion of the new ones hasn't started yet. I'll postpone the issue until the next release.

chrwm avatar Sep 19 '22 15:09 chrwm