dbptk-ui icon indicating copy to clipboard operation
dbptk-ui copied to clipboard

Unable to load 92MB file with 5 tables

Open gillianh1 opened this issue 2 years ago • 12 comments

Description: Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads.

Context: DBPTK Desktop: Installed on Windows 10 PC Using dbptk-desktop-2.6.0.exe

Steps required to reproduce the bug:

  1. Generated a file using DBPTK desktop. Contains 5 tables and the file is 92MB.
  2. When try to open file in DBPTK desktop a blue progress dot pulses on the open option but the file never loads. ( a smaller file of 2MB with 2 tables does load successfully)
  3. I tried increasing memory in settings. But still unable to load the file.
  4. Have we reached the limitations of DBPTK desktop or running this on a Windows PC?

Is there any documentation on hardware/sizing requirements or limitations?

image

gillianh1 avatar Sep 16 '22 08:09 gillianh1

Hi,

Please attach the log files to better understand the problem. Logs are available in the menu Help -> Logs

hmiguim avatar Sep 16 '22 10:09 hmiguim

The file was created successfully using 2.6.0 but we where unable to open using 2.6 We have since been able to connect to same database and user using version 2.6.1 and have been able to create a new extract file and open the 92MB file. We are however still unable to load the original file created using version 2.6 in 2.6.1 desktop exe. We are able to open the new file created in version 2.6.1 using version 2.6 desktop exe. I will upload the log

gillianh1 avatar Sep 16 '22 10:09 gillianh1

dbvtk.log Latest failed attempt at 11:47

gillianh1 avatar Sep 16 '22 10:09 gillianh1

This seems to be the issue, a non-hex character in input.

2022-09-16 11:47:58,262 [http-nio-auto-1-exec-7] ERROR o.a.solr.handler.RequestHandlerBase - org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape sequence: o
	at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:212)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:333)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
	at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227)
	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214)
	at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1003)
	at com.databasepreservation.common.server.index.utils.SolrUtils.find(SolrUtils.java:155)
	at com.databasepreservation.common.server.index.DatabaseRowsSolrManager.find(DatabaseRowsSolrManager.java:178)
	at com.databasepreservation.common.api.v1.DatabaseResource.getViewerDatabaseIndexResult(DatabaseResource.java:97)
	at com.databasepreservation.common.api.v1.DatabaseResource.find(DatabaseResource.java:71)

luis100 avatar Sep 16 '22 15:09 luis100

Generally, the XML might be malformed, it started using an Unicode escape sequence but then put an "o" instead of a number. So you must look into the SIARD content to see where this came from.

luis100 avatar Sep 16 '22 15:09 luis100

The SIARD file was produced using DBPTK Desktop (Using dbptk-desktop-2.6.0.exe)

No error was received when file was produced. So how would we know there was an issue with the file? Do we always need to open and validate the file. Can we not assume a file is OK if SIARD file created without error?

If rename the SIARD file with a .zip extension we can navigate the files.

We have subsequently create a new file using dbptk-desktop-2.6.1.exe pointing to the same user an database and this file is OK so it is not an issue with the tables/data being extracted from the database.

I will try generating the file again from 2.6.0 Desktop version to see if can reproduce the issue.

gillianh1 avatar Sep 20 '22 08:09 gillianh1

I was able to extract, import and validate the file in version 2.6. image This time the file does open. I have access to both files and both files are the same size. I saved both files as .zip and was able to navigate all files/tables. I will attach the log.

gillianh1 avatar Sep 20 '22 10:09 gillianh1

Latest log

dbvtk.log

Original file from 2.6 will not load (uoesiardschema_extract.siard) New file from 2.6 will load (2.6_uoesiardschema_extract.siard)

gillianh1 avatar Sep 21 '22 09:09 gillianh1

Hi @gillianh1 thank you for using and testing DBPTK and your feedback. Since version 2.6.1 is working fine I suggest you using that version instead of 2.6.0.

hmiguim avatar Sep 21 '22 09:09 hmiguim

This is what I plan to do. My only concern is that a file that was produced without error yet it cannot be opened. I would not like to be in this position when try to open a SIARD file in the future.

Is your recommendation to create, open and validate each file that is produced before archiving?

Thanks

gillianh1 avatar Sep 21 '22 09:09 gillianh1

The validation step is essential to have a proof that the produced SIARD is following the specification.

To ensure that no record is lost you can use a module called Merkle Tree filter documentation available here. However this requires to have a stored procedure that calculates the hash for every column exported using the Merkle tree top hash algorithm.

DBPTK offers you a set of tools to validate and verify completeness and correctness. And as a rule of thumb you should create, open and validate to see if the extract process went well.

hmiguim avatar Sep 21 '22 09:09 hmiguim

Thank you for you help and confirmation.

gillianh1 avatar Sep 21 '22 09:09 gillianh1