NiaPy icon indicating copy to clipboard operation
NiaPy copied to clipboard

GWO returns different type of results (individual) than other methods

Open karakatic opened this issue 3 years ago • 14 comments

GWO has individuals in the type of ndarray, while other algorithms have individuals of type of NiaPy.algorithms.algorithm.Individual.

This is revealed when the runIteration returns the results: pop of GWO is the array of ndarrays and others return array of Individual.

karakatic avatar Dec 02 '20 15:12 karakatic

Have a good day @karakatic. Please, @GregaVrbancic , would it be desirable to have an array of Individual as the output type? Or what should be?

sisco0 avatar Dec 24 '20 02:12 sisco0

@kb2623 - Can you explore this issue please?

firefly-cpp avatar Jan 19 '21 12:01 firefly-cpp

I have not been active on this project for so long and I have not been in contact with project owner. So I need more information on what needs to be done and what project owner desires.

My pint of view

Type Individual is used to store the solution as ndarray type can do so. So Individual type is just a wrapper to ndarray type. But Individual type can store additional information like value of fitness function, age, and so on. Type Individual was added to this project to make code shorter/smaller/compacter (number of new lines and indentations in the code) and more readable for some algorithm. The advantage I saw in using Individual is that extended algorithms could use a large portion of basic algorithm to implement the extended versions (less code to write, but not in all cases). The other advantage hides in optimization type (discrete / continuous). When implementing a new algorithm, the developer does not need to think about fixing the solution, because user can pass a function that fixes the solution based on the type of optimization.

From users perspective, type used for representing solutions in algorithm is not important, because all algorithms at the end should return the best solution that is represented as an array. This can be seen from Algorithm interface, that in the end returns array and it's fitness value.

From algorithm developer perspective, type used for storing intermediate solutions is important, because it is linked to final return of the run functionality.

Current state

Algorithms that use ndarray type to store solutions in current version are:

  • GWO
  • FOA
  • BA and all extended version
  • BEA
  • CRO
  • and much more

As I can remember of algorithm that uses Individual type to store solutions are:

  • DE and all extended version
  • ABC
  • and some others

Question

Should all algorithms in coming version use type Individual for sorting intermediate solutions?

kb2623 avatar Jan 19 '21 13:01 kb2623

@kb2623 - Welcome back to the project.

@GregaVrbancic, @karakatic, @lukapecnik, @rhododendrom - Please join this discussion and share your views.

@karakatic - Please give us more information about your issues.

firefly-cpp avatar Jan 19 '21 14:01 firefly-cpp

@sisco0 - Please join this discussion. You are the newest contributor with fresh ideas.

firefly-cpp avatar Jan 19 '21 14:01 firefly-cpp

Have a good day everyone. From my point of view it seems desirable to have np.ndarray type as the output for library integration into projects.

The idea behind my proposal is to accomplish the implementation of solutions based on our library without needing to extract the best agent values from the Individual class, using the np.ndarray best agent solution instead.

It is right that, from our implementations, where we tackle with accuracy comparisons between algorithms, teaching and research, the Individual class could fit better as we require all that information for obtaining indicators for our experiments. Oppositely, it should be taken into account that other people who use this library would not exploit these indicators and, in the case of needing them, these could be recovered by using a goal function evaluation.

As a counterpart of my point of view, maybe intermediate information for the best agent, as pointed out by @kb2623, is important to keep.

sisco0 avatar Jan 19 '21 15:01 sisco0

I agree, that every algorithm should return the solution in the same form - be it in np.ndarray (preferred) or as an Individual. Just providing a utility function that transforms Individuals to np.ndarray would probably be enough.

Thank you for all the good work.

karakatic avatar Jan 20 '21 09:01 karakatic

Thanks @sisco0 and @karakatic

I share a very similar opinion. Utility function may be the best start.

Could we ask @kb2623 to implement this feature? You are the architect of current Individual class and may be the most appropriate for the development of this feature.

firefly-cpp avatar Jan 21 '21 11:01 firefly-cpp

I have invested a lot of my time in this project and felt short on some of my other obligations, but I am only now participating in a real debate for project code modelling. Big thanks to @sisco0 and @karakatic.

As seen from implementation of the Individual class, this class has some operators that work only on wrapped ndarray so there is no need for additional utility functions. But there could be added some more operators implementations, like addition, multiplication, division and so on. The other option is to make Individual class a subclass of ndarray from NumPy.

Some time ago, @firefly-cpp and I talked about an additional feature that would store all the intermediate results of the algorithm run for the purpose of algorithm analysis. If every thing would be stored in a ndarray type, out general purpose analysis algorithm would not be so robust and maybe for some algorithms developer should write their own analysis algorithm. With Individual type a general purpose analysis algorithm would be much more robust and maybe algorithm developers should add only some specific extensions for our analysis algorithm.

The fact is, that many algorithms where implemented based on code from MATLAB. As is clear for a developer that comes from other programming languages like C, C++, C# and so on, that MATLAB is much more domain specific programming language, many features are missing from MATLAB language (memory management, communication between other programs/process, user interface (yes it has an user interface library, but can not be used outside MATLAB), parallel execution (yes there is parallel execution, but it is very limited)...). MATLAB is not by any means a bad programming language, MATLAB was developed to do specific operations and do them as fast as possible (at least some of them and this is more on the implementation side more than on the execution side). Many users of MATLAB come from research (mathematics, mechanical engineering, electrical engineering, chemistry) and teaching departments, therefore, their programming skills are limited to the needs of the domain they are dealing with. So may MATLAB algorithm implementations use limited number of language features. This can be seen in using only MATLAB arrays, for storing intermediate results. Many MATLAB algorithm implementations do not even store utility function values of solutions, but this makes theirs implementation of algorithm very slow, because they have to evaluate the solutions every time when they need the info of the solution quality.

Because of reason pointed out, I am more supportive for the use of Individual type, but I still expect a good discussion about a decision that will be long-term.

kb2623 avatar Jan 26 '21 14:01 kb2623

Thanks for bringing back the unrealized feature that was intended for storing all intermediate solutions. I forgot about this feature that was planned for future releases. This feature had a high priority.

Nowadays, many researchers are interested in obtaining the intermediate results. Therefore, this is a PLUS for Individual class. Here you have my support.

The biggest problem I see is when new users want to implement their own algorithm. For example, when our students are faced with the task of implementing a new algorithm, they have a few problems with understanding this Class. However, this obstacle could be overcome by comprehensive documentation.

If we stick to Individual class, then we need to provide also a very detailed documentation for the users. Otherwise, users could spend many additional hours for studying these components when implementing new algorithms.

Let's wait for the comment of @GregaVrbancic

firefly-cpp avatar Jan 26 '21 21:01 firefly-cpp

I have a similar opinion as @karakatic. Most important, at least to me, is that each algorithm returns solutions in the same form, whether in the form of ndarray or Individual - here, I do not have any preference. However, I understand that many researchers would be happy to obtain the solutions in the form of ndarray, making further processing easier for them.

Regarding the implementation, I am afraid I am not familiar enough to make some constructive arguments about how we should implement this. I can only speak from my perspective as a repository maintainer and the one who is making NiaPy releases and other operational tasks. Regardless of the implementation, I would strongly encourage every potential contributor who would implement this to be careful when making potential breaking changes that would reflect in different API design and consequently changed the way the framework is used. At this stage of NiaPy development, we all should be aware that there are, fortunately, many NiaPy users which are utilizing NiaPy in their own projects, and such breaking changes are greatly affecting them (they have to update their code to use newer versions of NiaPy). However, if the breaking changes in core functionalities of NiaPy are necessary, of course, go for it; just before you do it, think it through and pay attention to the consequences which such change would bring.

Such concerns and lack of consistency in the current state of the NiaPy are also primarily a reason for not releasing a stable version of NiaPy 2. When such problems are solved, I am more than happy to release the long-awaited stable version 2.

GregaVrbancic avatar Jan 27 '21 11:01 GregaVrbancic

In terms of performance and speed numpy.ndarray is the clear winner. A numpy array of objects is insanely slow especially if you have to call the methods of those objects. Even a python list beats it. On the other hand if an algorithm's individual has a lot of extra parameters, that would be a lot of extra arrays to pass as runIteration's dparams, and it could get hard keeping track of and updating all of them.

I propose a great compromise: NumPy Structured Arrays

It's basically an array of POD structs, with some really cool properties.

You initialize them like so:

import numpy as np
D = 5
NP = 50

individual_type = [('x', np.float64, D), ('f', np.float64)]
population = np.zeros(NP, dtype=individual_type)

Which basically the same as

constexpr unsigned int D = 5;
constexpr unsigned int NP = 50;

struct Individual
{
    double x[D];
    double f;
};

Individual population[NP];

in C++.

You can then do stuff like:

population['x'] = np.random.rand((NP, D)) # initialize all positions randomly. (population['x'] is a NP * D array)

population[0]['f'] # fitness of first individual

population = np.sort(population, order='f')  # sort population by fitness

population['f'] = 10.0  # set all fitnesses to 10

There are also recarrays, which allow you to access fields as attributes (e. g. population.x).

I like @kb2623's idea of extending the ndarray class, but I suggest we make a Population class which extends numpy.ndarray. Maybe it could have some extra methods or attributes for getting the current best individual and it could store the global best individual, although currently the task class does that also. We could then require every Algorithm to provide an attribute itype, similar to individual_type in the above example, and that itype would be used to construct the population. That's where you would put all the attributes that would currently go in the additional parameters of runIteration or as attributes in an extended Individual class. the initPopulation would return a Population object, and runIteration would only need to accept one argument which is that Population object. From the user perspective nothing would have to be changed in the way of running algorithms,

What do you guys think?

zStupan avatar Mar 26 '21 14:03 zStupan

Algorithms using the Individual class:

  • ABC
  • CA
  • DE
  • ES
  • FSS
  • GA
  • MKE
  • HDE
  • jDE

I suggest we refactor these to use numpy arrays for now, until a better solution is found and implemented.

zStupan avatar Jun 05 '21 12:06 zStupan

Hello all, any fresh thoughts on this issue? What would be the best solution? Still, the uniform behavior is of API is practically a must have.

GregaVrbancic avatar Dec 20 '23 06:12 GregaVrbancic