machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Support Multiple Threads on ML Related DB Access

Open superichmann opened this issue 2 years ago • 1 comments

Is your feature request related to a problem? Please describe. According to my tests, when trying to Execute multiple RegressionExperiments in parallel where data is loaded through DatabaseSource, some bottleneck occurs on the DB access layer, probably due to a single instance of the DB access layer (even when creating a separate DatabaseSource for each experiment).

This prevents accessing the full throttle of the cpu and the database.

My tests were:

  1. Reviewing DB logs: when running experiments, the DB access occurs sequentially and not at the same time.
  2. Running from separate EXE: when running each experiment from a separate process, suddenly the DB gets much more queries at the same time. all experiments end faster.

Still I might miss something so correct me if I am wrong.

Describe the solution you'd like Allow instantiation of multiple DB connections to allow running experiments in parallel.

Describe alternatives you've considered

  • Tried to load the idataview with the entire db data and then dropcolumns or filter but that took very long time (loading from db)

Additional context Windows machine, both database and ml are on the same machine.

superichmann avatar May 17 '23 05:05 superichmann

@LittleLittleCloud this looks like it may also be an issue with AutoML.

michaelgsharp avatar Jan 24 '24 18:01 michaelgsharp