tpot2
tpot2 copied to clipboard
Population class logging is inefficient and needs to be optimized.
The population class stores its logs in a pandas data frame. This has very slow appends, which get slower the more items are in the data frame. This becomes a bigger issue with long running evolutions or evolutions with short evaluation times that can iterate through individuals quickly. For example, Tutorial 8 becomes noticeably slower as it progresses.
Also related: Pandas 2.0.0 breaks some code in the Population class. Currently, it only supports 1.5.3
The Population class needs to be optimized with a better underlying data structure. Perhaps an SQL structure? Or a dictionary of dictionaries? I think using a pandas data frame with preallocation would not be ideal.
Discussed during the monthly meeting: Nick to experiment some more with dictionaries and report back.