pyglove icon indicating copy to clipboard operation
pyglove copied to clipboard

Clarification on Efficiency

Open mekaneeky opened this issue 2 years ago • 1 comments

Hello lovely people,

Thank you for opensourcing this lovely library ! As an AutoML researcher, I am quite grateful.

I however stumbled upon this comment in the Travelling Salesman notebook. "PyGlove is designed to be used with problems with evaluation time greater than sub-second," which proved to be true when I rewrote a Simplified AutoML Zero implementation I coded in numba/numpy vs an attempt to recode it in Pyglove. The NumPy version does about 600-800 iterations per second, while when I run the Pyglove version it takes about a minute to run 400-800 iterations on the same device.

Here is where I define the search space and search process. I let the DNA objects be created automatically (I presume through DNAGenerator?) by using pg.List and pg.oneof:

INIT_MEMORY = initialize_memory(memory_size = MEMORY_SIZE, matrix_size= MATRIX_SIZE, start=-10, end=10)
search_space = pg.List([pg.oneof([key for key in OP_dict_basic.keys()]), 
                pg.oneof([i for i in range(0,len(INIT_MEMORY))]), 
                pg.oneof([i for i in range(0,len(INIT_MEMORY))]),
                pg.oneof([i for i in range(0,len(INIT_MEMORY))]),
                pg.oneof([i for i in range(0,len(INIT_MEMORY))])] * OP_DEPTH)
search_algorithm = pg.evolution.regularized_evolution(
    population_size=POPULATION_COUNT, tournament_size=TOURNAMENT_COUNT)

auto_ml_zero_session(search_space, search_algorithm)

Is there a solution to improve efficiency? Would it be reasonable to subclass DNA and create a version of DNA that runs with numpy under the hood?

Or would it be more reasonable to do as the onemax example and create a custom decoder subclassing pg.hyper.CustomHyper and a custom mutator which both run numpy under the hood to address this efficiency gap?

Thanks

mekaneeky avatar Oct 03 '22 16:10 mekaneeky

Thanks for the great question.

If you profile, you will find that the performance overhead from PyGlove's builtin hyper primitives (e.g. pg.oneof) majorly come from symbolic decoding, which involves extensively copy of objects (e.g. pg.Dict/pg.List instances here).

In order to match the performance of native implementations, or to at least close the gap, pg.hyper.CustomHyper/custom mutator is the way to go. They allow the user to control the entire process of DNA decoding and mutation, as you found out. I will be curious to see the speed up you can eventually get through CustomHyper in your Simplified AutoML Zero project.

daiyip avatar Oct 04 '22 00:10 daiyip