GeneticAlgorithmPython icon indicating copy to clipboard operation
GeneticAlgorithmPython copied to clipboard

Method best_solution() does not always return correct solution_idx

Open borisarloff opened this issue 2 years ago • 6 comments

In the event where multiple best solutions happen to have the same best fitness value, then best_solution() returns the first generation it finds with that same high fitness value. That index however does not necessarily correspond to the returned best solution.

A work-around could be to retrieve the index of the best solution, rather than relying on best_solution() for that index. However, this could also fail when in the rare case of more than one generation with a same best solution. It would be unclear as to which index is being returned. On the other hand, such would not matter when the fitness is the same best fitness with a deterministic and not a stochastic fitness function (FF).

To reproduce, create a FF which repeatedly generates same few fitness values. Call best_solution() and compare with: print(f"Generation: {ga_instance.generations_completed} Best solution fitness: {ga_inst.best_solution()[1]}") output from on_generation callback function.

borisarloff avatar Nov 14 '22 17:11 borisarloff

I have a example where best_solution() as far as I can tell return a random index, but most of the time 0. Furthermore, saving the best index by hand in ff results in wrong results as well. I am stumped. I am pretty new to machine learning, so please forgive me if my code is ridiculous in any way, and tell me where I went wrong. Any help is greatly appreciated. Here is main.py, the important stuff:

import numpy
import pygad
import pygad.nn
import pygad.gann
import ele

def fitness_func(solution, sol_idx):
    global GANN_instance, lastbest, lastbesti
    #Run give NN data and ask it what to do next
    envos = [ele.simulation(sol_idx)]
    data_inputs = numpy.empty(shape=[1,86], dtype=int)

    #structure of inputs: current floor, target floor of elevator person[]*max in elevator, target floor of waiting person[floor[max people per floor]]
    data_inputs[0] = envos[0].run([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]) #Run the first iteration to get a data input
    envos[0].score = 0

    #Simulate 20 iterations
    #indices:                  0   1     2     3 4 5 6 7                                8 9 10 11 12 13 14 15 16 17   
    #strucutre of predictions: up, down, stop, unload passenger[]*max pass in elevator, load passenger[]*max passenger on floor
    for i in range(20):
        predictions = pygad.nn.predict(last_layer=GANN_instance.population_networks[sol_idx], #What will the network do in this situation?
                                    data_inputs=data_inputs, problem_type="regression")
        data_inputs[0] = envos[0].run(predictions[0]) #Simulate according to the instructions of the network and save that as the next input

    #save best current network index manually
    if lastbest <= envos[0].score:
        lastbest = envos[0].score
        lastbesti = sol_idx
    return envos[0].score

def callback_generation(ga_instance):
    global GANN_instance

    population_matrices = pygad.gann.population_as_matrices(population_networks=GANN_instance.population_networks, 
                                                            population_vectors=ga_instance.population)

    GANN_instance.update_population_trained_weights(population_trained_weights=population_matrices)

    print("Generation = {generation}".format(generation=ga_instance.generations_completed))
    print("Accuracy         = {fitness}".format(fitness=ga_instance.best_solution()[1]))

lastbest = 0
lastbesti = 0

GANN_instance = pygad.gann.GANN(num_solutions=50,
                                num_neurons_input=86,    #Number of Datapoints to respect
                                num_neurons_hidden_layers=[2],
                                num_neurons_output=18,   #Number of possible Actions
                                hidden_activations=["relu"],
                                output_activation="sigmoid")

population_vectors = pygad.gann.population_as_vectors(population_networks=GANN_instance.population_networks)

ga_instance = pygad.GA(num_generations=20, 
                       num_parents_mating=3, 
                       initial_population=population_vectors.copy(),
                       fitness_func=fitness_func,
                       mutation_percent_genes=35,
                       callback_generation=callback_generation)

ga_instance.run()
ga_instance.plot_fitness()

solution, solution_fitness, solution_idx = ga_instance.best_solution()
#print(solution)
print(solution_fitness) #result: about 16
print("Solution index acc. to Library:", solution_idx, "Solution acc. to comparison:", lastbesti)

print("Score of solution_idx:", fitness_func(0,solution_idx)) #result of both: all over the place, tho always very bad (<0)
print("Score of lastbesti:", fitness_func(0,lastbesti))

main.zip

Simonrazer avatar Jan 07 '23 15:01 Simonrazer

@borisarloff hi, you talked about the solution_idx, in my test, the solution and solution fitness could also be wrong.

For example, with stochastic fitness function, we have a such setting:

    num_generations = 3
    num_parents_mating = 2
    fitness_function = fitness_func
    sol_per_pop = 4
    num_genes = 4
    parent_selection_type = "tournament"
    keep_elitism = 1

generation 0: [0.5,0.5,0.5,0.4] generation 1: [0.5,0.5,0.6,0.5] generation 2: [0.8,0.6,0.5,0.5] generation 3: [0.8,0.6,0.6,0.5]

fitness of the best_solution() be [0.6] rather than [0.8], I am confused.

whubaichuan avatar Feb 21 '23 16:02 whubaichuan

@borisarloff hi, you talked about the solution_idx, in my test, the solution and solution fitness could also be wrong.

For example, with stochastic fitness function, we have a such setting:

    num_generations = 3
    num_parents_mating = 2
    fitness_function = fitness_func
    sol_per_pop = 4
    num_genes = 4
    parent_selection_type = "tournament"
    keep_elitism = 1

generation 0: [0.5,0.5,0.5,0.4] generation 1: [0.5,0.5,0.6,0.5] generation 2: [0.8,0.6,0.5,0.5] generation 3: [0.8,0.6,0.6,0.5]

fitness of the best_solution() be [0.6] rather than [0.8], I am confused.

@whubaichuan,

Could you share a code sample please? Just to replicate the issue on my end.

ahmedfgad avatar Feb 21 '23 23:02 ahmedfgad

@ahmedfgad hi, I guess due to the same problem you have answered here. I think the multiple times of calculations for the same solution will influence my best_fitness across the generation because I am using the stochastic fitness function. By the way, when can I use the correct version? Thanks a lot.

whubaichuan avatar Feb 22 '23 08:02 whubaichuan

@whubaichuan,

This would be the reason why you have this behavior. Please give a try when the new release is published.

ahmedfgad avatar Feb 22 '23 13:02 ahmedfgad

@ahmedfgad

problem solved in Pygad new release (2.19.2)

yes, here is the output of best_solutions_fitness (there are 10 generations and the keep_elitism is larger than 0)

[0.7014856849397931, 0.7132997342518398, 0.7557456777209327, 0.7557456777209327, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089, 0.7429285673868089]

Here, 0.7557456777209327 is not kept to the end. Same problem here

whubaichuan avatar Feb 22 '23 13:02 whubaichuan