datajoint-python icon indicating copy to clipboard operation
datajoint-python copied to clipboard

simple inserts very slow

Open theanenome opened this issue 2 years ago • 2 comments

Bug Report

Description

Inserting 1000 rows containing only a single float takes 15 seconds. Inserting more scales with the number of rows (e.g., 3000 takes 57 seconds). I will eventually need to insert up to a million rows, but clearly this is not scalable

Reproducibility

Include:

  • OS (Linux)
  • Python Version 3.7.13
  • MySQL Ver 14.14
  • MySQL Deployment Strategy (local-native)
  • DataJoint Version 0.13.4

MWE

import datajoint as dj
import numpy as np

dj.config['database.host'] = XXX
dj.config['database.user'] = XXX
dj.config['database.password'] = XXX

schema = dj.schema('testing', locals())

@schema
class TestDB(dj.Manual):
    definition = """
    neuron_id: int
    ---
    activity: float
    """

activities = np.random.random(1000)

# Takes a long time
for i in range(1000):
    db.insert1({'neuron_id':i, 
                'activity':activities[i]}, skip_duplicates=True)

Additional Research and Context

Possibly related to #131

theanenome avatar Apr 25 '22 15:04 theanenome

Indeed, inserting one row at a time is slow because each insert triggers a round trip to the database server. You can insert multiple rows at a time instead:

    # A single insert is much faster than multiple inserts
    db.insert(({'neuron_id':i, 
                'activity':activities[i]} for i in range(1000)))

This is expected behavior and we will close this issue.

dimitri-yatsenko avatar Apr 25 '22 15:04 dimitri-yatsenko

Hi @theanenome, you can find further documentation on batched inserts in the DataJoint Docs. Thanks.

kabilar avatar Apr 25 '22 19:04 kabilar