CommonCoreOntologies
CommonCoreOntologies copied to clipboard
Move ActOfDataTransformation up to the Event Ontology
cco:ActOfDataTransformation currently resides in a CCO extension ontology, but is generic enough that it more appropriately belongs in the CCO. Propose moving the term into the Event Ontology along with its parent class cco:ActOfInformationProcessing.
# http://www.ontologyrepository.com/CommonCoreOntologies/ActOfDataTransformation
:ActOfDataTransformation a owl:Class;
rdfs:subClassOf :ActOfInformationProcessing;
:definition "An Act of Information Processing in which an algorithm is executed to transform one or more input Information Content Entities into one or more output Information Content Entities."@en;
:elucidation "It is not a requirement that the output Information Content Entity(ies) be qualitatively distinct from the input(s) as a result of an Act of Data Transformation, though doing so is typically the goal of performing this Act. Consider, for example, selecting a column in an Excel spreadsheet then executing the \"Remove Duplicates\" Algorithm on it. The intent is to remove rows in that column containing duplicate content. If no duplicate values are present, the information in the column remains unchanged but an Act of Data Transformation was nonetheless performed."@en;
:is_curated_in_ontology "http://www.ontologyrepository.com/CommonCoreOntologies/Mid/EventOntology"^^xsd:anyURI;
rdfs:label "Act of Data Transformation"@en .
# http://www.ontologyrepository.com/CommonCoreOntologies/ActOfInformationProcessing
:ActOfInformationProcessing a owl:Class;
rdfs:subClassOf :IntentionalAct;
:definition "A Planned Act in which one or more input Information Content Entities are received, manipulated, transferred, or stored by an Agent."@en;
:is_curated_in_ontology "http://www.ontologyrepository.com/CommonCoreOntologies/Mid/EventOntology"^^xsd:anyURI;
rdfs:label "Act of Information Processing"@en .
I agree these probably deserve a home in CCO-mid. Although I can see case for scoping a domain ontology for information processing. But until then, let's keep them here.
The definition for ActOfDataTransformation is circular. What it means to transform something needs to articulated. Presumably 'manipulated' in the parent class includes transformation.
Fist hit on google: "Data transformation is the process of converting, cleansing, and structuring data into a usable format ...". It goes on to suggest four types:
- Constructive, where data is added, copied or replicated
- Destructive, where records and fields are deleted
- Aesthetic, where certain values are standardized, or
- Structural, which includes columns being renamed, moved, and combined
Wikipedia says "In computing, data transformation is the process of converting data from one format or structure into another format or structure."
Question is: do we cast a wide net and allow transformation to include generating new content, eg- when a table is "transformed" into a graph with added content provided by the semantic model is added, or, limit it to formatting and structural changes? If the former, then how do we reconcile transformation with statistical and ML processes?
Act of Data Transformation = An Act of Information Processing in which an algorithm is executed to act upon one or more input Information Content Entities into one or more output Information Content Entities.
Saying 'act upon' avoids the problem of enumerating the possibilities of transformation (conversion, restructuring, etc), and also (per the elucidation) allows for the possibility that the data is not changed, just acted upon. I may have a function that removes references to a certain word in a body of text, but if the text never contained that word, then the text data that was transformed never actually changed.
Answering the comment on #133, yes, this is the sort of thing I'm looking for. However, it's not a data transformation if there's no change, so I don't buy the rationale for "act on". That might be appropriate for a more neutral superclass act of information processing. However, a problem with its current definition is that merely receiving doesn't seem to be a "processing", in the normal sense. If something is received and acted on, that's a processing.
Also, there's something wrong with the grammar: "act on ... into". Note the OBI definition "A planned process that produces output data from input data.". Perhaps: a process which takes as input an information content entity and has output a changed input or new output information content entity. As an aside I dislike the automatic prefix "Act of". It's hard to misinterpret the simpler "data transformation".
There's a bit of an issue with intentional processes in general, in that it isn't clear what the scope of intention is. Suppose I run a command line with an input being not the file I intended, perhaps because autocomplete completed the wrong thing. That doesn't seem to satisfy the definition of "Planned Act": "An Act in which at least one Agent plays a causative role and which is prescribed by some Directive Information Content Entity held by at least one of the Agents.". The first part holds (assuming general problems with definitions which depend on cause are resolved), but the second clause would seem to not be.
@cameronmore Please move out on executing the revision you've articulated based on @mark-jensen 's request.
For @alanruttenberg's comments regarding treating these as 'acts', this is a matter that warrants a larger discussion. Hence I'll start a thread on the discussion board and cite both this ticket and 133 for context. Thanks!