molgenis icon indicating copy to clipboard operation
molgenis copied to clipboard

Importing table with ~260.000 rows takes very long time (in ADD/UPDATE mode)

Open marikaris opened this issue 5 years ago • 7 comments

How to Reproduce

Import via advanced importer: vkgl-model.xlsx Then import vkgl_consensus_history data (mail me for data or simulate the vkgl_consensus_history data with ~260.000 rows)

Expected behavior

It succeeds in less than 5 minutes (only strings, no weird data)

Observed behavior

Import takes over an hour

marikaris avatar Oct 18 '19 11:10 marikaris

observed behaviour goes for both xlsx and csv files, both on test server and localhost

bartcharbon avatar Oct 18 '19 13:10 bartcharbon

Screen Shot 2019-10-18 at 16 23 12 This was the same import in 7.x (I guess it was 7.2 back then)

marikaris avatar Oct 18 '19 14:10 marikaris

It still performs fine in the current release in "ADD" mode. The problem seems to originate from the "ADD/UPDATE"

ADD: ~30 sec ADD/UPDATE: between 80 and 100 minutes UPDATE: ~30 seconds

ADD/UPDATE for a subset of 100000 rows: ~3 minutes

bartcharbon avatar Oct 18 '19 14:10 bartcharbon

The problem seems to be the format of the id attr of the dataset. The string used as ID for this dataset is really difficult to index for postgres.

A test with an autoID test set results in ADD/UPDATE of 100000 rows in a few seconds. (both starting with an empty table, as one with all the rows already present)

bartcharbon avatar Oct 22 '19 09:10 bartcharbon

I thought this one was fixed?

mswertz avatar Feb 04 '20 17:02 mswertz

@bartcharbon could you answer @mswertz his question?

dennishendriksen avatar Feb 10 '20 12:02 dennishendriksen

not fixed.

dennishendriksen avatar Feb 10 '20 13:02 dennishendriksen