SIARD-DK: When exporting table meta data, we must change 'typeOriginal' accordingly, if type is changed (CLOBS or BLOBS)
@andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.
https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124
Example:
Orginal Mysql Table:
CREATE TABLE `datatypes` (
`col1` blob COMMENT 'Test comment',
`col_key` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column',
PRIMARY KEY (`col_key`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;
Table metadata in generated SIARD-DK archive:
<table>
<name>datatypes</name>
<folder>table1</folder>
<description>Description should be entered manually</description>
<columns>
<column>
<name>col1</name>
<columnID>c1</columnID>
<type>INTEGER</type>
<typeOriginal>BLOB</typeOriginal>
<nullable>true</nullable>
<description>Test comment</description>
<functionalDescription>Dokumentidentifikation</functionalDescription>
</column>
<column>
<name>col_key</name>
<columnID>c2</columnID>
<type>INTEGER</type>
<typeOriginal>INT</typeOriginal>
<nullable>false</nullable>
<description>Comment Primary Key Column</description>
</column>
</columns>
<primaryKey>
<name>PRIMARY</name>
<column>col_key</column>
</primaryKey>
<rows>1</rows>
</table>
We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).
I will fix this
2016-02-15 14:33 GMT+01:00 Thomas Kristensen [email protected]:
@andreaskring https://github.com/andreaskring When exporting columns with CLOBS or BLOBS, we change the column type to integer (as it is supposed to identify the file with the binary data). When we do this transformation, we also need to change to 'orginalType' value, as the import mechanism sometimes relies on the original type for determining the type of the columns.
https://github.com/magenta-aps/db-preservation-toolkit/blob/dev/dbptk-core/src/main/java/com/databasepreservation/modules/siard/out/metadata/TableIndexFileStrategy.java#L124
Example:
Orginal Mysql Table:
CREATE TABLE
datatypes(col1blob COMMENT 'Test comment',col_keyint(11) NOT NULL AUTO_INCREMENT COMMENT 'Comment Primary Key Column', PRIMARY KEY (col_key) ) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;Table metadata in generated SIARD-DK archive:
datatypes table1 Description should be entered manually col1 c1 INTEGER BLOB true Test comment Dokumentidentifikation col_key c2 INTEGER INT false Comment Primary Key Column PRIMARY col_key 1 We this archive is imported in MySQL, this will result in the 'col1' column getting datatype 'blob', which is not what we want, as the data will now be integers (ids of files with the exported blobs in the archive).
â Reply to this email directly or view it on GitHub https://github.com/keeps/db-preservation-toolkit/issues/128.
Andreas Kring Magenta Aps Incuba, Ã bogade 15 DK-8200 Aarhus N
tel +45 3336 9699 dir +45 8177 1698
http://www.magenta.dk [email protected]
@andreaskring When 'typeOriginal' is handled, consider adjusting the test here: https://github.com/magenta-aps/db-preservation-toolkit/blob/siarddk-import-module/dbptk-core/src/test/java/com/databasepreservation/testing/integration/siard/SiardDKTest.java#L157
Some parts of dbptk relied on the "original type", but this has proven to be a bad practice. Sure, it easily solves some problems for now, but in the long run relying on the original type will cause a lot of trouble due to the variety of values that can be present in that field. The preferred way to recognise types is to use the SQL standard type (99, 2008, etc). This may lead to some type changes (ie, text fields becoming CLOBs) but that may be acceptable as long as no data is lost (for that the new type must be able to contain all values that could be represented in the original type).