molgenis
molgenis copied to clipboard
Importing table with ~260.000 rows takes very long time (in ADD/UPDATE mode)
How to Reproduce
Import via advanced importer: vkgl-model.xlsx Then import vkgl_consensus_history data (mail me for data or simulate the vkgl_consensus_history data with ~260.000 rows)
Expected behavior
It succeeds in less than 5 minutes (only strings, no weird data)
Observed behavior
Import takes over an hour
observed behaviour goes for both xlsx and csv files, both on test server and localhost

It still performs fine in the current release in "ADD" mode. The problem seems to originate from the "ADD/UPDATE"
ADD: ~30 sec ADD/UPDATE: between 80 and 100 minutes UPDATE: ~30 seconds
ADD/UPDATE for a subset of 100000 rows: ~3 minutes
The problem seems to be the format of the id attr of the dataset. The string used as ID for this dataset is really difficult to index for postgres.
A test with an autoID test set results in ADD/UPDATE of 100000 rows in a few seconds. (both starting with an empty table, as one with all the rows already present)
I thought this one was fixed?
@bartcharbon could you answer @mswertz his question?
not fixed.