molgenis
molgenis copied to clipboard
Importing table with ~260.000 rows takes very long time (in ADD/UPDATE mode)
How to Reproduce
Import via advanced importer: vkgl-model.xlsx Then import vkgl_consensus_history data (mail me for data or simulate the vkgl_consensus_history data with ~260.000 rows)
Expected behavior
It succeeds in less than 5 minutes (only strings, no weird data)
Observed behavior
Import takes over an hour
observed behaviour goes for both xlsx and csv files, both on test server and localhost
data:image/s3,"s3://crabby-images/569e7/569e70a5ef3390876d184bfa0f16d8a3aba10305" alt="Screen Shot 2019-10-18 at 16 23 12"
It still performs fine in the current release in "ADD" mode. The problem seems to originate from the "ADD/UPDATE"
ADD: ~30 sec ADD/UPDATE: between 80 and 100 minutes UPDATE: ~30 seconds
ADD/UPDATE for a subset of 100000 rows: ~3 minutes
The problem seems to be the format of the id attr of the dataset. The string used as ID for this dataset is really difficult to index for postgres.
A test with an autoID test set results in ADD/UPDATE of 100000 rows in a few seconds. (both starting with an empty table, as one with all the rows already present)
I thought this one was fixed?
@bartcharbon could you answer @mswertz his question?
not fixed.