datajoint-python
datajoint-python copied to clipboard
simple inserts very slow
Bug Report
Description
Inserting 1000 rows containing only a single float takes 15 seconds. Inserting more scales with the number of rows (e.g., 3000 takes 57 seconds). I will eventually need to insert up to a million rows, but clearly this is not scalable
Reproducibility
Include:
- OS (Linux)
- Python Version 3.7.13
- MySQL Ver 14.14
- MySQL Deployment Strategy (local-native)
- DataJoint Version 0.13.4
MWE
import datajoint as dj
import numpy as np
dj.config['database.host'] = XXX
dj.config['database.user'] = XXX
dj.config['database.password'] = XXX
schema = dj.schema('testing', locals())
@schema
class TestDB(dj.Manual):
definition = """
neuron_id: int
---
activity: float
"""
activities = np.random.random(1000)
# Takes a long time
for i in range(1000):
db.insert1({'neuron_id':i,
'activity':activities[i]}, skip_duplicates=True)
Additional Research and Context
Possibly related to #131
Indeed, inserting one row at a time is slow because each insert triggers a round trip to the database server. You can insert multiple rows at a time instead:
# A single insert is much faster than multiple inserts
db.insert(({'neuron_id':i,
'activity':activities[i]} for i in range(1000)))
This is expected behavior and we will close this issue.
Hi @theanenome, you can find further documentation on batched inserts in the DataJoint Docs. Thanks.