CommonCoreOntologies
CommonCoreOntologies copied to clipboard
Data sets, Data collections, Data aggregates - what is the proper BFO branch?
I am trying to sort out what a collection of information entities is. If someone says, "I have a dataset", I want to translate that into either the Information Bearing Entity (IBE) that carries multiple ICEs, or the Information Content Entity (ICE) that has multiple ICEs as parts.
The purpose: I want to get all the information asserted in the data set. I'm leaving it ambiguous what is the nature of the data set so far.
Here's my grasp of an important difference between IBEs and ICEs, that I haven't heard anyone assert:
- IBEs have their parts contingently, and so the ICEs they carry are also contingent. An IBE can gain and lose ICEs across time.
- ICEs have their parts always; the has_continuant_part_at_all_times holds between an ICE and each component ICE. An ICE cannot gain and lose ICE parts across time.
I want to run through some examples.
- cco:Document changes in qualities of black ink on white paper, such that the cco:Document is_carrier_of ICE1, ICE2, and ICE3 at time1, but at later time2, it also is_carrier_of ICE4.
- Across the same times and for the same reasons, Maximal-ICE1 has_part ICE1, ICE2, ICE3 at time1, and Maximal-ICE2 has_part ICE1, ICE2, ICE3, and ICE4. Moreover, Maximal-ICE1 != Maximal-ICE2.
- User uploads a data set onto a server in a CSV format at time 1. So, the user participated in some process whereby the IBE server changed qualities, and now is carrier of some ICE5 at time 2.
- User opens the CSV file at time 3, and removes a row of data and saves the result at time 4. So, the user participated in some process whereby the IBE server changed qualities, and now it is carrier of some ICE6 at time 4. Then the user shares access to the data set to a coworker at time 5.
What is the best way to conceive of the entity called a data set, in the last two bullets?
If the data set is an IBE, then it is a part of the server IBE, and the gain or loss of a row of data just is a change in the qualities (or IBE parts?), and the maximal ICE carried by the dataset at time 1 is not the same as the maximal ICE carried by the dataset at time 5.
If the data set is an ICE, then we cannot say that the data set is token identical across time1-time5.
Is there a best practice to capture the totality of ICEs in a data set?
Some common suggestions:
- A data set is an ICE with multiple ICEs as parts
- A data set is an IBA that is designed to carry multiple ICEs for information retrieval or computation.
- A data set is a [new class] Aggregate of Information Content Entity (subclass of [new class] aggregate of generically dependent continuant)
Discussions of this issue, and other ICE-related issues, are occurring in other forums, so this issue will be converted to a discussion.