dbptk-developer icon indicating copy to clipboard operation
dbptk-developer copied to clipboard

MySQL Garbage data in datetime-fields results in too much logging

Open ghost opened this issue 8 years ago • 13 comments

Description:

I'm trying to run dbptk on a very large database (45,5 GB) for archival. When running the tool, it produces hundreds of 50 MB log-files in the directory until the disk is full. The files seem to contain a lot of garbage data, but I'm unsure of this is just the module dumping its memory/objects or if it is actually trying to store the garbage data in the column. I have attached the repeating error message only, since the log files are too large.

The message indicates that it's trying to insert a very long string into a timestamp-field. Strangely, the string does not exist in the "attendance" table in the database, but another table called "assessment log". The error seem to trigger on different fields, mainly DATETIME fields in the database, and there is always a very long string/CLOB which doesn't match the field and doesn't match the data in the database.

This error occurs both with MariaDB and Oracle MySQL. I've imported the SQL script in both just to make sure. Let me know if any of this needs more detail.

Attach the dbptk-app.log.txt file below.

dbptk-error-msg.txt

ghost avatar May 12 '17 09:05 ghost

This issue seems very similar to another one which I was not able to reproduce.

Since you have the SQL script, could you try to create the database with just the attendance table and export it? too see if it the error still happens. Creating and exporting a database with just the assessment log table may also be helpful.

Can you also send me the first lines of the log? I just need the ones that have the application version info. They look something like this:

2017-05-10 09:45:11,316 [main] DEBUG (c.d.Main) #########################################################
2017-05-10 09:45:11,320 [main] DEBUG (c.d.Main) #   START-ID-e4dbe40d-1f11-463e-a9fa-a95cca054843
2017-05-10 09:45:11,321 [main] DEBUG (c.d.Main) #   START v2.0.0-rc2
2017-05-10 09:45:11,325 [main] DEBUG (c.d.Main) #   version info: {"git.build.user.email":"[email protected]","git.build.host":"young","git.dirty":"false","git.remote.origin.url":"[email protected]:keeps/db-preservation-toolkit.git","git.closest.tag.name":"2.0.0-beta7.2","git.commit.id.describe-short":"2.0.0-beta7.2-14","git.commit.user.email":"[email protected]","git.commit.time":"04.05.2017 @ 16:27:00 WEST","git.commit.message.full":"updated version to rc2","git.build.version":"2.0.0-rc2","git.commit.message.short":"updated version to rc2","git.commit.id.abbrev":"bb6efa8","git.branch":"master","git.build.user.name":"Bruno Ferreira","git.closest.tag.commit.count":"14","git.commit.id.describe":"2.0.0-beta7.2-14-gbb6efa8","git.commit.id":"bb6efa86f3401d4cd1a7f31ec66c7037ced2ee1a","git.tags":"","git.build.time":"04.05.2017 @ 16:28:10 WEST","git.commit.user.name":"Bruno Ferreira"}
2017-05-10 09:45:11,325 [main] DEBUG (c.d.Main) #########################################################

The log getting too big is also an issue here... I think the log is only being pruned when the application starts.

chalkos avatar May 12 '17 10:05 chalkos

I have now created a new database with the first 10k rows in the table "attendance". See attached log from the export.

dbptk-app.log.txt.zip

Let me know if you want a CSV or similar of the actual data. There seem to be quite a lot of null bytes in this log...

ghost avatar May 12 '17 12:05 ghost

Could you send me the SQL statements that create the first 10 rows of that table? If possible, I would like you to just trim the original file and upload a file with those SQL statements (to preserve any control characters that may be present).

The log still shows that DBPTK is (supposedly) receiving garbage data from MySQL:

screenshot from 2017-05-12 13-45-58

It is good that we are getting a smaller sample data that I can use in my tests.

chalkos avatar May 12 '17 12:05 chalkos

See attached SQL-script.

its-attendance.zip

ghost avatar May 12 '17 13:05 ghost

Note: I've removed indexes and foreign keys from the CREATE-statement.

ghost avatar May 12 '17 13:05 ghost

That SQL looks fine to me. I wonder if that is really the data that ends up being extracted from the database... maybe somewhere later in the SQL file the table is changed (maybe some database migrations are applied after?).

Can you try to use an SQL file to create that table with just a few rows (the error happens on the 6th row, according to the log, so 10 rows should be enough to test) that can be used to get the error when extracting with DBPTK?

Alternatively (and preferably) you can do it the other way around: load the whole database; remove other tables, keys, etc; remove all rows from that table stating from the 10th row; dump the database to SQL.

The alternative procedure has the advantage of including any changes that are made to the table after it is created and the base data is added.

chalkos avatar May 12 '17 13:05 chalkos

I imported the first 10 rows and ran the export. The attached zip-file contains the results. I don't think there is any changes later on, the full SQL-script seem fairly plain... There might be some corruption happening with the mysql-driver maybe. The debug message shows that somewhere "value" is being filled with the entire row separated by some control characters when trying to export the field "LastEdited". The field has the type DATETIME in the database. Maybe there is a bug related to this type? This is the only field of this type in the "attendance" table... It is quite strange that it doesn't fail until the 6th row, though...

its-attendance-10ln.zip

ghost avatar May 15 '17 08:05 ghost

I agree with that. I also think that the problem is in the driver.

Can you send me the SQL dump? So I can try to reproduce the error whilst debugging the application, to understand what is going on.

And also let me know the MySQL and Mariadb versions you used.

chalkos avatar May 15 '17 10:05 chalkos

I can't really send you the entire SQL dump, since it contains sensitive information, but the zip-file attached a few days ago should be enough to reproduce the fault. I can maybe extract a few more tables related to "attendance" which do not contain sensitive data if necessary.

Servers: MySQL Community Edition 5.7.18 x64 MariaDB 10.1.22 x64

If there is anything else I can do to help debugging, let me know.

ghost avatar May 15 '17 11:05 ghost

I'm only asking for the same rows you sent me in the SIARD but in SQL dump form, there is no need to send me the remaining tables.

You would just need to create and send me an SQL Dump of the same database you exported to SIARD and sent me a few days ago.

chalkos avatar May 15 '17 11:05 chalkos

Oh, right. See attached file. Is this what you need? Renamed it to .txt for convenience.

its-attendance-sql-dump.txt

ghost avatar May 15 '17 11:05 ghost

Yes, it is. Thanks. I'll try to use this to debug the error.

EDIT: the SQL dump was enough to reproduce the error on my end.

chalkos avatar May 15 '17 12:05 chalkos

Putting this into screening

luis100 avatar Apr 24 '19 08:04 luis100