tpot2 icon indicating copy to clipboard operation
tpot2 copied to clipboard

Population class logging is inefficient and needs to be optimized.

Open perib opened this issue 1 year ago • 1 comments

The population class stores its logs in a pandas data frame. This has very slow appends, which get slower the more items are in the data frame. This becomes a bigger issue with long running evolutions or evolutions with short evaluation times that can iterate through individuals quickly. For example, Tutorial 8 becomes noticeably slower as it progresses.

Also related: Pandas 2.0.0 breaks some code in the Population class. Currently, it only supports 1.5.3

The Population class needs to be optimized with a better underlying data structure. Perhaps an SQL structure? Or a dictionary of dictionaries? I think using a pandas data frame with preallocation would not be ideal.

perib avatar Sep 28 '23 21:09 perib

Discussed during the monthly meeting: Nick to experiment some more with dictionaries and report back.

miguelehernandez avatar Nov 16 '23 22:11 miguelehernandez