moses Create a subclass of field_set without continuous representation as variable-length sequences of bits.

By recommendation of Nil, I'm opening a issue initially discussed on the Slack moses channel. Summary of the situation: I'm adding particle swarm to the optimization part of moses, and representing continuous knobs as variable-length sequences of bits is causing loss of information and ps doesn't need this discrete representation. As discussed in the channel, I was thinking of change this representation just inside particle swarm, but there's the problem of returning the deme to the metapopulation that would lose information in the same way. So Ben said that this design "doesn't really need to persist".

I was thinking that I could "create a modified version of field_set overriding the contin parts to use 1 packet_t as float and use this modified version only with particle swarm". I still don't know how it can affect other parts like the metapopulation, or if it can be done easily. Because of that i'm opening this issue looking for opinions.

Have in mind that will use a little more ram for each instance with contin parts, but with a better precision and less processing.

Jul 06 '15 18:07 ArleyRistar

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses.

How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance.

Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm MOSES.

For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme.

Jul 07 '15 02:07 linas

yah -- This seems like a reasonable approach for now.. any thoughts from Dr. Geisweiller?

ben

On Tue, Jul 7, 2015 at 10:29 AM, Linas Vepštas [email protected] wrote:

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses.

How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance.

Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm MOSES.

For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119050858.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

Jul 07 '15 03:07 bgoertzel

@linas the current state of the code already nearly does that, a contin_spec contains a mean and any knob turning is centered around that mean. However the variance is replaced by the expansion combined with the step, so if you tweak the contin trits (Left, Right, Stop) you get some distribution, let's call it a CTD (for Contin Trits Distribution), it's not a Gaussian, but maybe it approaches one.

Anyway, I agree that the distribution should not be limited to a CTD. My advice would be to add or upgrade the neighborhood sampling methods to take in argument a distribution, could be a Gaussian or otherwise.

I don't think you need to change anything else. And here's why, candidates in the deme are represented as instances

typedef std::vector<packed_t> instance;

see instance.h

If you were to represent the contin_spec directly as a float you'd still need to hack the code to en/decode a contin into an instance (see field_set::pack in field_set.h). It's not impossible, but it adds a coding burden to our student. And if you run out of resolution you can always increase the contin_spec depth, so why bother.

So to sum up, yes ps doesn't need this contin trits representation, but MOSES does. Ultimately I don't think this en/decoding of candidates into instances is good. It has the advantage of saving a bit of RAM at the expense of making the code more rigid and complex. But getting rid of it is probably too much work for this project.

I'll get back with more details about what should be hacked to enable neighborhood sampling with any distribution, not just CTDs.

Jul 07 '15 12:07 ngeiswei

Hmm, on the other hand converting a float into bytes is trivial, and it would save some CPU time.

Anyway, you'd still need to upgrade the neighborhood sampling to support any distribution, those 2 issues are really independent. So I suggest

Upgrade the neighborhood sampling distribution to support any distribution.
If it turns out manipulating large contin with the current trit representation is a bottleneck, then hack the contin_spec and en/decoding methods to support float representation.

Jul 07 '15 12:07 ngeiswei

@arleyristar The code to hack to support 1. is definitely in

moses/moses/moses/neighborhood_sampling.{h,cc}

you may start to study it, till I come back with more information.

Jul 07 '15 12:07 ngeiswei

At that point I really need to understand exactly what you want to do and how. There are several ways to go about that problem, like the distribution could be passed to the sampling function (sample_from_neighborhood, then generate_contin_neighbor or another new one), or it could be inside the contin_spec.

Also there is a problem I had not thought about, the mean, in the contin_spec, remains the same during the deme optimization, it cannot be changed because otherwise the contin values corresponding to the instances will not be correctly reconstructed. So if you really want to have the mean change during deme optimization we need to address that.

Jul 07 '15 13:07 ngeiswei

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

Jul 07 '15 13:07 ngeiswei

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first problem.

In your penultimate email, i'm assuming that it was for me. I'll see how can i do that, but leaving the distribution inside contin_spec it's what makes more sense to me for now. As for changing the mean, don't worry, what i said in the channel about changing the mean was just a option that i thought before Ben say that this representation doesn't need to persist. Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there. I didn't made any outside documentation yet, most of the things i was putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei [email protected] wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119209634.

Jul 07 '15 15:07 ArleyRistar

From the things I thought this seems a acceptable idea for 1; forgive me if I misunderstood what i was supposed to do or if this is a bad idea.

Nil, we could maintain neighborhood sampling as it is now and change contin_stepper and contin_spec. Not even get_contin and set_contin from field_set would change or by just a little (for 1).

Let me explain in details: For now we have a behavior similar to this: https://code.google.com/p/moses/wiki/ModelingAtomSpaces For example, with mean = 0, expansion = 2 and S=Stop, L=Left, R=Right: S -> 0 LS -> -2 step = 4 RS -> 2 step = 4 LLS -> -6 step = 8 LRS -> -1 step = 0.5 RLS -> 1 step = 0.5 RRS -> 6 step = 8 ... It has a better precision next to it's mean, and it kind of resembles a gaussian to me.

My idea is to modify contin_stepper left() and right() to have a behavior similar to the link, but between [0,1]. In this case it would start with 0.5 as mean and decreasing step_size by half, starting with 0.25.

Applying this contin_stepper to the actual LRS representation would give us a probability that could be applied to a (cumulative) distribution function. The distribution function would be stored inside contin_spec in the place of mean and step_side (depth has to stay to use in the stepper).

I could write some distributions that would be choose as a parameter (leaving normal as default) and for the parameters, it's given in the construction of the contin_spec, mean and expansion (as variation but this last would be better if if bigger).

That makes sense or should i think of other solution?

On Tue, Jul 7, 2015 at 12:52 PM, Arley Ristar [email protected] wrote:

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first problem.

In your penultimate email, i'm assuming that it was for me. I'll see how can i do that, but leaving the distribution inside contin_spec it's what makes more sense to me for now. As for changing the mean, don't worry, what i said in the channel about changing the mean was just a option that i thought before Ben say that this representation doesn't need to persist. Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there. I didn't made any outside documentation yet, most of the things i was putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei [email protected] wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119209634.

Jul 07 '15 23:07 ArleyRistar

On Tue, Jul 7, 2015 at 8:17 PM, ngeiswei [email protected] wrote:

@linas https://github.com/linas the current state of the code already nearly does that,

Arghh... yes but no. I understand perfectly well how contin works in moses, and basically, it sucks. It needs a major revamp. I used it a lot, and eventually gave up, because it behaved so badly. I think I know why it mis-behaves, and the sketch I made was for an approach that I think could significantly improve it.

The contin_t, as currently designed, attempts to use the center-points of the Cantor set, which, in a certain naive sense, does have a uniform distribution over the reals, but it fails for practical machine learning problems. The problem is that the vertexes in the Cantor tree will almost never be close to the actual value you are trying to learn, and it takes too many iterations to get to the vertex that does fit your data. e.g trying to converge to 1.333 by using 1.0 1.5 1.25 1.375 ... is ineffective and slow. It is far too many steps for MOSES to get the right value. Its even worse if you want learn more than one float.

I am kind-of guessing that storing full 64-bit floats, and using a random walk will converge faster than dyadic Cantor tree walks. I can't prove that it will converge faster, but contin_t works so poorly that almost anything will be better.

Jul 08 '15 04:07 linas

p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there.

Jul 08 '15 04:07 linas

So, from Linas idea and issue 2 from Nil: contin_spec and contin_stepper won't be used and are almost useless now, it doesn't make sense maintain it in field_set. get_contin and set_contin has to change to insert the float (contin_t is already a double) inside instance. Change generate_contin_neighbor to do the random walk. Plus a lot of minor updates.

While this isn't decided, I'll be fixing other things in ps.

On Wed, Jul 8, 2015 at 1:31 AM, Linas Vepštas [email protected] wrote:

p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119426959.

Jul 08 '15 11:07 ArleyRistar

@linas, if you update the contin neighborhood sampling you don't have to iterate 1.0 1.5 1.25 1.375 ... to get to 1.333. You just set the contin_spec with enough depth and you can set the value 1.333 right away.

@arleyristar contin_spec is certainly useful, but could indeed be greatly simplified, both in code and performances, if the cantor tree representation was replaced by a float representation. Although there is still one area the cantor tree representation wins. If the resolution of the contin values turn out to be low then it takes much less memory, a few bits instead of a systematic 32 bits. It's probably not worth it but useful to keep in mind, and again you don't need to change the contin_spec code to allow to sample with other distributions (unless you put the distribution in the contin_spec of course), as I explain above.

Jul 09 '15 07:07 ngeiswei

@arleyristar you cannot change the mean, the expension or the step of a contin_spec, once constructed, or at least it wouldn't make much sense because in order reconstruct the contin value given the packed bit string you need those, if they change while optimizing the deme the contin values on the candidates will also change, so the older candidates won't have their contin values correctly reconstituted.

That's why I want to understand exactly how you're algorithm is gonna work, I'm currently reading

https://en.wikipedia.org/wiki/Particle_swarm_optimization

let me know if you follow that, or some alteration of it (this will help me to understand your code when I start reviewing it).

Jul 09 '15 07:07 ngeiswei

@arleyristar I'm only 1/3 way through reviewing your code, but I think you don't need to change the neighborhood sampling functions, all sampling code is taking place in the particle_swarm struct, and that makes sense given its peculiarities (you don't sample the neighborhood directly, you sample the coefficients to construct the velocities).

So I don't think you need to do 1. or 2. All you need is to increase the depth of the contin_spec. If that becomes a performance bottleneck then you can work toward improving contin_spec, etc. But if the performance overhead is negligible, you don't need to bother (unless your really want to).

Anyway, I'm keeping reviewing, I may change my mind once I see the whole thing.

Jul 09 '15 09:07 ngeiswei

@arleyristar I'm re-reading your first comment when you created that issue. You got it all. Forget my comment about modifying the neighborhood distribution, I got there because of Linas initial comment and my lack of understanding of ps. So I stand by my comment right above, that can be synthesized into these questions

Have you tried to set the depth high enough so that the loss of information isn't an issue?
Does it create a performance bottleneck?

If the answers are yes, then I think what you suggest is the correct hack, replace the cantor tree by float binary rep. Of course before we go there, we want to do things well, record the performances of a bunch of runs before, to compare with after the code optimization, etc. And I'll probably want to review the contin_spec once more, just in case I see something simple and bright to do. But first things first, answer those 2 questions.

Jul 09 '15 09:07 ngeiswei

For instance if you could plot

Performance
Run-time
RAM

w.r.t. depth (varying from say 5 to 32), for one or more optimization problems that you know ps is supposed perform OK, that would definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation.

Jul 09 '15 09:07 ngeiswei

I'll change the code to increase the depth and test with it as Nil said. I'll post the results here, but it can take a while (give me 1 day).

On Thu, Jul 9, 2015 at 6:45 AM, ngeiswei [email protected] wrote:

For instance if you could plot

Performance

Run-time

RAM

w.r.t. depth (varying from say 5 to 32), for one more more optimization problems that you know ps is supposed perform somewhat OK, that would definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-119891670.

Jul 09 '15 11:07 ArleyRistar

It may take you a few days even, but it's worth it, not just for ps. Thanks!

Jul 09 '15 12:07 ngeiswei

Hi Nil, Arley,

I have to retract/modify my earlier comments as well; they were based on a misunderstanding of sorts. So, what I meant to say was this:

-- I am very excited by the PSO work; I think it will be an excellent way of handling float-point numbers in moses. My grumbling was less about the contin representation itself, than it was with how the the hill-climbing algo treated it: hill-climbing by twiddling one binary bit at a time of a contin was just terribly slow, inefficient, awful.

-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem.

-- If contins are replaced in the combo trees, then I strongly recommend using 64-bit floats (i.e. doubles), do NOT mess with 32-bit floats. Messing about with 32-bit floats is not worth the effort. Its a bad idea.

-- Nil, the way I currently understand it is as follows: every combo tree would have one to hundreds of doubles in it. If the metapop has 10K trees in it, then there would be 10K trees/metapop * 200 doubles/tree * 8 bytes/double = 16MBytes of doubles in the metapop: hardly a big amount.

-- During a single PSO step, we pick just one tree, and so PSO only needs to optimize 1 to a few hundred doubles. If you have 1K particles per swarm, that is about 200K doubles per PSO step, which again is not very big, and might even fit in modern-day CPU caches.

So I see no problem at all with storing doubles everywhere. Done right PSO will be awesome for moses. I'm actually excited, now that I think I understand it.

Jul 09 '15 14:07 linas

Linas, you wrote,

"-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem."

Yes they exist, field_set::get_contin and field_set::set_contin, but in fact you can directly en/deconde a contin via dereferencing the contin_iterator.

Yeah, you're right, the extra RAM will probably be negligible. I'd still be interested in seeing how increasing the depth would help the quality of the results, and whether this en/decoding is really hurting run-time performance, though. I mean it's nice to know the actual gain of changing a piece of code (although the code simplification is a gain by itself).

Jul 09 '15 14:07 ngeiswei

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

Jul 09 '15 15:07 linas

TL;DR update: Problem, worse performance, don't expect good results from what Nil asked. This weekend i'll try to use Valgrind.

Detailed update: I don't have the results that Nil asked because i'm running in some problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always initialize the particle with a really big value [double min, double max] and when i get the result (after the conversion) it always has the value of the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3 sec (i7 4500U, predicates m1000). When i changed to a start with smaller values between [-30, 32](range when depth is 5) i got the result right (in fact the conversion works right with anything a little minor than 3E+17, it gives 32 that is the limit, more than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply a restriction to the values to not happen a wrong conversion (-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time (150+s).

The worse thing? it doesn't happen because of the particle-swarm update of the particles, the bottleneck is when it gets the score (that is coded exactly like the hill climbing) AND it almost didn't improved in results because it depends of the discrete part too and it always fall in local maximum.

Changing the problem didn't help either (iris m2000 has very little contin %): hc: score - 48.79, time: 78s ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates -m1000 (iris m2000 44s) even with the limit so it's almost certain that using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll give a look at Valgrind (i don't know how to use it yet, i'm used to just use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas [email protected] wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120024176.

Jul 11 '15 01:07 ArleyRistar

What is the test problem that you are running? --linas

On Sat, Jul 11, 2015 at 9:53 AM, arleyristar [email protected] wrote:

TL;DR update: Problem, worse performance, don't expect good results from what Nil asked. This weekend i'll try to use Valgrind.

Detailed update: I don't have the results that Nil asked because i'm running in some problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always initialize the particle with a really big value [double min, double max] and when i get the result (after the conversion) it always has the value of the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3 sec (i7 4500U, predicates m1000). When i changed to a start with smaller values between [-30, 32](range when depth is 5) i got the result right (in fact the conversion works right with anything a little minor than 3E+17, it gives 32 that is the limit, more than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply a restriction to the values to not happen a wrong conversion (-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time (150+s).

The worse thing? it doesn't happen because of the particle-swarm update of the particles, the bottleneck is when it gets the score (that is coded exactly like the hill climbing) AND it almost didn't improved in results because it depends of the discrete part too and it always fall in local maximum.

Changing the problem didn't help either (iris m2000 has very little contin %): hc: score - 48.79, time: 78s ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates -m1000 (iris m2000 44s) even with the limit so it's almost certain that using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll give a look at Valgrind (i don't know how to use it yet, i'm used to just use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas [email protected] wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120024176.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120562427.

Jul 11 '15 03:07 linas

@arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

Jul 13 '15 13:07 ngeiswei

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

Jul 13 '15 13:07 ArleyRistar

I forgot to post here too: https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals). 2. Results description:2.1. Memory:

PS with reduction spend more memory than without.
Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.
HC without reduction reaches the ram limit independently of depth.
HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.
The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

PS without reduction is much quicker than with reduction.
HC without reduction is slower than with reduction.
PS without reduction is quicker than HC without reduction.
PS with reduction is very much slower than HC with reduction.

2.4. Others:

Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.
The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.

Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

PS without reduction:1.1. Massif:

depth 5: peak 1.2842 MiB = 0.168322662 MB
depth 16: peak 1.3022 MiB = 0.170681958 MB
depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)
depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

depth 5:
- get_contin: 1.95%
- set_contin: 0.7%
- update_particles: 3.17%
- complexity_based_scorer: 80.65%
depth 16:
- get_contin: 5.39%
- set_contin: 1.97%
- update_particles: 6.98%
- complexity_based_scorer: 77.19%
depth 32:
- get_contin: 11.02%
- set_contin: 4.15%
- update_particles: 14.16%
- complexity_based_scorer: 77.30%
depth 128:
- get_contin: 16.18%
- set_contin: 10.34%
- update_particles: 24.40%
- complexity_based_scorer: 67.82%

1.3. Time (s):

Depth: 2 | Mean: 2.557880
Depth: 3 | Mean: 2.593005
Depth: 4 | Mean: 2.663667
Depth: 5 | Mean: 2.668498
Depth: 6 | Mean: 2.701040
Depth: 7 | Mean: 2.644231
Depth: 8 | Mean: 2.714792
Depth: 9 | Mean: 2.692005
Depth: 10 | Mean: 2.680957
Depth: 11 | Mean: 2.735071
Depth: 12 | Mean: 2.783216
Depth: 13 | Mean: 2.743773
Depth: 14 | Mean: 2.725607
Depth: 15 | Mean: 2.722347
Depth: 16 | Mean: 2.755987
Depth: 17 | Mean: 2.901231
Depth: 18 | Mean: 2.809483
Depth: 19 | Mean: 2.762999
Depth: 20 | Mean: 2.758817
Depth: 21 | Mean: 2.772569
Depth: 22 | Mean: 2.816113
Depth: 23 | Mean: 2.928664
Depth: 24 | Mean: 2.904253
Depth: 25 | Mean: 2.938958
Depth: 26 | Mean: 2.845138
Depth: 27 | Mean: 2.877145
Depth: 28 | Mean: 2.857758
Depth: 29 | Mean: 2.840159
Depth: 30 | Mean: 2.822597
Depth: 31 | Mean: 2.813350
Depth: 32 | Mean: 2.820256

HC without reduction:2.1. Massif:
- depth 5: peak 53.1730 MiB = 6.96949146 MB
- depth 16: peak 53.1730 MiB = 6.96949146 MB
- depth 32: peak 53.1730 MiB = 6.96949146 MB
- depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

depth 5:
- get_contin: 0.78%
- set_contin: 0%
- hill_climbing: 0.05%
- complexity_based_scorer: 65.23%
depth 16:
- get_contin: 0.78%
- set_contin: 0%
- hill_climbing: 0.05%
- complexity_based_scorer: 65.21%
depth 32:
- get_contin: 0.78%
- set_contin: 0%
- hill_climbing: 0.05%
- complexity_based_scorer: 65.23%
depth 128:
- get_contin: 0.78%
- set_contin: 0%
- hill_climbing: 0.06%
- complexity_based_scorer: 65.20%

2.3. Time (s):

Depth: 2 | Mean: 13.933148
Depth: 3 | Mean: 13.710301
Depth: 4 | Mean: 13.771378
Depth: 5 | Mean: 13.699602
Depth: 6 | Mean: 13.643018
Depth: 7 | Mean: 13.953615
Depth: 8 | Mean: 14.218726
Depth: 9 | Mean: 14.607479
Depth: 10 | Mean: 14.577442
Depth: 11 | Mean: 14.953429
Depth: 12 | Mean: 14.705942
Depth: 13 | Mean: 14.993052
Depth: 14 | Mean: 14.541223
Depth: 15 | Mean: 14.249910
Depth: 16 | Mean: 14.563872
Depth: 17 | Mean: 15.361906
Depth: 18 | Mean: 14.469012
Depth: 19 | Mean: 14.755074
Depth: 20 | Mean: 14.411482
Depth: 21 | Mean: 14.821014
Depth: 22 | Mean: 14.412988
Depth: 23 | Mean: 14.660408
Depth: 24 | Mean: 14.989600
Depth: 25 | Mean: 14.081191
Depth: 26 | Mean: 14.491643
Depth: 27 | Mean: 14.121354
Depth: 28 | Mean: 14.673866
Depth: 29 | Mean: 14.079422
Depth: 30 | Mean: 13.745444
Depth: 31 | Mean: 13.860764

PS with reduction:3.1. Massif:
- depth 5: peak 2.7437 MiB = 0.359622246 MB
- depth 16: peak 2.8201 MiB = 0.369636147 MB
- depth 32: peak 4.2963 MiB = 0.563124634 MB
- depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

depth 5:
- get_contin: 0.11%
- set_contin: 0.01%
- update_particles: 0.02%
- complexity_based_scorer: 91.94%
depth 16:
- get_contin: 0.11%
- set_contin: 0.02%
- update_particles: 0.06%
- complexity_based_scorer: 93.25%
depth 32:
- get_contin: 0.12%
- set_contin: 0.04%
- update_particles: 0.15%
- complexity_based_scorer: 91.06%
depth 128:
- get_contin: 0.25%
- set_contin: 0.16%
- update_particles: 0.37%
- complexity_based_scorer: 92.87%

3.3. Time (s):

Depth: 32 | Mean: 180

HC with reduction:4.1. Massif:
- depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

depth 32:
- get_contin: 0.42%
- set_contin: 0%
- hill_climbing: 0.08%
- complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

Jul 27 '15 12:07 ArleyRistar

Update from what we talked about in the Moses channel.

I sent the above post check how depth affects the code.
In the conclusion i found a issue different from what we were testing (PS is slow with reduction).
Ben ask for a hypotheses.
I respond that it may be of Nil said 3 emails ago.
I show a callgrind that has a big difference in the number of instructions to reduct HC and PS.
I start to talk to Nil.
Empirically, i tested that HC will have bad performance too if it change the continuous knobs every eval too (because PS do this).
Nil and I didn't decided anything about reduction.
But we decided to change the variable length representation to double (that was what Linas wanted 7 emails ago).
We decided to change instance<packet_t> to std::bitset (i have a update about it below).
Ben gave some good ideas that we could try, but he'll be offline until Sunday, so it would be better wait for him.
Linas confirms that reduction with contin works well (in the post we could see that it really makes a big difference for HC independently of depth).

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar [email protected] wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals). 2. Results description:2.1. Memory:

PS with reduction spend more memory than without.

Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.

HC without reduction reaches the ram limit independently of depth.

HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).

Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.

The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

PS without reduction is much quicker than with reduction.

HC without reduction is slower than with reduction.

PS without reduction is quicker than HC without reduction.

PS with reduction is very much slower than HC with reduction.

2.4. Others:

Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.

The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.

Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

PS without reduction:1.1. Massif:

depth 5: peak 1.2842 MiB = 0.168322662 MB

depth 16: peak 1.3022 MiB = 0.170681958 MB

depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)

depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

depth 5:

get_contin: 1.95%

set_contin: 0.7%

update_particles: 3.17%

complexity_based_scorer: 80.65%

depth 16:

get_contin: 5.39%

set_contin: 1.97%

update_particles: 6.98%

complexity_based_scorer: 77.19%

depth 32:

get_contin: 11.02%

set_contin: 4.15%

update_particles: 14.16%

complexity_based_scorer: 77.30%

depth 128:

get_contin: 16.18%

set_contin: 10.34%

update_particles: 24.40%

complexity_based_scorer: 67.82%

1.3. Time (s):

Depth: 2 | Mean: 2.557880

Depth: 3 | Mean: 2.593005

Depth: 4 | Mean: 2.663667

Depth: 5 | Mean: 2.668498

Depth: 6 | Mean: 2.701040

Depth: 7 | Mean: 2.644231

Depth: 8 | Mean: 2.714792

Depth: 9 | Mean: 2.692005

Depth: 10 | Mean: 2.680957

Depth: 11 | Mean: 2.735071

Depth: 12 | Mean: 2.783216

Depth: 13 | Mean: 2.743773

Depth: 14 | Mean: 2.725607

Depth: 15 | Mean: 2.722347

Depth: 16 | Mean: 2.755987

Depth: 17 | Mean: 2.901231

Depth: 18 | Mean: 2.809483

Depth: 19 | Mean: 2.762999

Depth: 20 | Mean: 2.758817

Depth: 21 | Mean: 2.772569

Depth: 22 | Mean: 2.816113

Depth: 23 | Mean: 2.928664

Depth: 24 | Mean: 2.904253

Depth: 25 | Mean: 2.938958

Depth: 26 | Mean: 2.845138

Depth: 27 | Mean: 2.877145

Depth: 28 | Mean: 2.857758

Depth: 29 | Mean: 2.840159

Depth: 30 | Mean: 2.822597

Depth: 31 | Mean: 2.813350

Depth: 32 | Mean: 2.820256

HC without reduction:2.1. Massif:

depth 5: peak 53.1730 MiB = 6.96949146 MB

depth 16: peak 53.1730 MiB = 6.96949146 MB

depth 32: peak 53.1730 MiB = 6.96949146 MB

depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

depth 5:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 16:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.21%

depth 32:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 128:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.06%

complexity_based_scorer: 65.20%

2.3. Time (s):

Depth: 2 | Mean: 13.933148

Depth: 3 | Mean: 13.710301

Depth: 4 | Mean: 13.771378

Depth: 5 | Mean: 13.699602

Depth: 6 | Mean: 13.643018

Depth: 7 | Mean: 13.953615

Depth: 8 | Mean: 14.218726

Depth: 9 | Mean: 14.607479

Depth: 10 | Mean: 14.577442

Depth: 11 | Mean: 14.953429

Depth: 12 | Mean: 14.705942

Depth: 13 | Mean: 14.993052

Depth: 14 | Mean: 14.541223

Depth: 15 | Mean: 14.249910

Depth: 16 | Mean: 14.563872

Depth: 17 | Mean: 15.361906

Depth: 18 | Mean: 14.469012

Depth: 19 | Mean: 14.755074

Depth: 20 | Mean: 14.411482

Depth: 21 | Mean: 14.821014

Depth: 22 | Mean: 14.412988

Depth: 23 | Mean: 14.660408

Depth: 24 | Mean: 14.989600

Depth: 25 | Mean: 14.081191

Depth: 26 | Mean: 14.491643

Depth: 27 | Mean: 14.121354

Depth: 28 | Mean: 14.673866

Depth: 29 | Mean: 14.079422

Depth: 30 | Mean: 13.745444

Depth: 31 | Mean: 13.860764

PS with reduction:3.1. Massif:

depth 5: peak 2.7437 MiB = 0.359622246 MB

depth 16: peak 2.8201 MiB = 0.369636147 MB

depth 32: peak 4.2963 MiB = 0.563124634 MB

depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

depth 5:

get_contin: 0.11%

set_contin: 0.01%

update_particles: 0.02%

complexity_based_scorer: 91.94%

depth 16:

get_contin: 0.11%

set_contin: 0.02%

update_particles: 0.06%

complexity_based_scorer: 93.25%

depth 32:

get_contin: 0.12%

set_contin: 0.04%

update_particles: 0.15%

complexity_based_scorer: 91.06%

depth 128:

get_contin: 0.25%

set_contin: 0.16%

update_particles: 0.37%

complexity_based_scorer: 92.87%

3.3. Time (s):

Depth: 32 | Mean: 180

HC with reduction:4.1. Massif:

depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

depth 32:

get_contin: 0.42%

set_contin: 0%

hill_climbing: 0.08%

complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

Jul 30 '15 20:07 ArleyRistar

About std::bitset: it isn't something for PS alone, and it isn't something that will increase the performance (not much anyway), but the code do some custom operations that exist there (and some that don't). But i've found that:

We can't use std::bitset, we have to use boost::dynamic_bitset (bitset has to have size at compilation time, something that we don't have).
Some functions already exist in dynamic_bitset (like field_set::count(inst)).
Some functions doesn't exist (but should), like get/set_raw, but it's made for vector<packet_t>.
You don't have to have 2 offsets anymore.
You can't have a iterator with bitset (well it has a custom iterator anyway, but it has to been rewrite).

Basically you have to change too much things to gain almost nothing, so i'm skipping it for now and jumping to change the variable representation.

On Thu, Jul 30, 2015 at 5:02 PM, Arley Ristar [email protected] wrote:

Update from what we talked about in the Moses channel.

I sent the above post check how depth affects the code.

In the conclusion i found a issue different from what we were testing (PS is slow with reduction).

Ben ask for a hypotheses.

I respond that it may be of Nil said 3 emails ago.

I show a callgrind that has a big difference in the number of instructions to reduct HC and PS.

I start to talk to Nil.

Empirically, i tested that HC will have bad performance too if it change the continuous knobs every eval too (because PS do this).

Nil and I didn't decided anything about reduction.

But we decided to change the variable length representation to double (that was what Linas wanted 7 emails ago).

We decided to change instance<packet_t> to std::bitset (i have a update about it below).

Ben gave some good ideas that we could try, but he'll be offline until Sunday, so it would be better wait for him.

Linas confirms that reduction with contin works well (in the post we could see that it really makes a big difference for HC independently of depth).

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar [email protected] wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS). Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals). 2. Results description:2.1. Memory:

PS with reduction spend more memory than without.

Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.

HC without reduction reaches the ram limit independently of depth.

HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).

Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.

The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

PS without reduction is much quicker than with reduction.

HC without reduction is slower than with reduction.

PS without reduction is quicker than HC without reduction.

PS with reduction is very much slower than HC with reduction.

2.4. Others:

Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.

The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.

Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

PS without reduction:1.1. Massif:

depth 5: peak 1.2842 MiB = 0.168322662 MB

depth 16: peak 1.3022 MiB = 0.170681958 MB

depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)

depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

depth 5:

get_contin: 1.95%

set_contin: 0.7%

update_particles: 3.17%

complexity_based_scorer: 80.65%

depth 16:

get_contin: 5.39%

set_contin: 1.97%

update_particles: 6.98%

complexity_based_scorer: 77.19%

depth 32:

get_contin: 11.02%

set_contin: 4.15%

update_particles: 14.16%

complexity_based_scorer: 77.30%

depth 128:

get_contin: 16.18%

set_contin: 10.34%

update_particles: 24.40%

complexity_based_scorer: 67.82%

1.3. Time (s):

Depth: 2 | Mean: 2.557880

Depth: 3 | Mean: 2.593005

Depth: 4 | Mean: 2.663667

Depth: 5 | Mean: 2.668498

Depth: 6 | Mean: 2.701040

Depth: 7 | Mean: 2.644231

Depth: 8 | Mean: 2.714792

Depth: 9 | Mean: 2.692005

Depth: 10 | Mean: 2.680957

Depth: 11 | Mean: 2.735071

Depth: 12 | Mean: 2.783216

Depth: 13 | Mean: 2.743773

Depth: 14 | Mean: 2.725607

Depth: 15 | Mean: 2.722347

Depth: 16 | Mean: 2.755987

Depth: 17 | Mean: 2.901231

Depth: 18 | Mean: 2.809483

Depth: 19 | Mean: 2.762999

Depth: 20 | Mean: 2.758817

Depth: 21 | Mean: 2.772569

Depth: 22 | Mean: 2.816113

Depth: 23 | Mean: 2.928664

Depth: 24 | Mean: 2.904253

Depth: 25 | Mean: 2.938958

Depth: 26 | Mean: 2.845138

Depth: 27 | Mean: 2.877145

Depth: 28 | Mean: 2.857758

Depth: 29 | Mean: 2.840159

Depth: 30 | Mean: 2.822597

Depth: 31 | Mean: 2.813350

Depth: 32 | Mean: 2.820256

HC without reduction:2.1. Massif:

depth 5: peak 53.1730 MiB = 6.96949146 MB

depth 16: peak 53.1730 MiB = 6.96949146 MB

depth 32: peak 53.1730 MiB = 6.96949146 MB

depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

depth 5:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 16:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.21%

depth 32:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 128:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.06%

complexity_based_scorer: 65.20%

2.3. Time (s):

Depth: 2 | Mean: 13.933148

Depth: 3 | Mean: 13.710301

Depth: 4 | Mean: 13.771378

Depth: 5 | Mean: 13.699602

Depth: 6 | Mean: 13.643018

Depth: 7 | Mean: 13.953615

Depth: 8 | Mean: 14.218726

Depth: 9 | Mean: 14.607479

Depth: 10 | Mean: 14.577442

Depth: 11 | Mean: 14.953429

Depth: 12 | Mean: 14.705942

Depth: 13 | Mean: 14.993052

Depth: 14 | Mean: 14.541223

Depth: 15 | Mean: 14.249910

Depth: 16 | Mean: 14.563872

Depth: 17 | Mean: 15.361906

Depth: 18 | Mean: 14.469012

Depth: 19 | Mean: 14.755074

Depth: 20 | Mean: 14.411482

Depth: 21 | Mean: 14.821014

Depth: 22 | Mean: 14.412988

Depth: 23 | Mean: 14.660408

Depth: 24 | Mean: 14.989600

Depth: 25 | Mean: 14.081191

Depth: 26 | Mean: 14.491643

Depth: 27 | Mean: 14.121354

Depth: 28 | Mean: 14.673866

Depth: 29 | Mean: 14.079422

Depth: 30 | Mean: 13.745444

Depth: 31 | Mean: 13.860764

PS with reduction:3.1. Massif:

depth 5: peak 2.7437 MiB = 0.359622246 MB

depth 16: peak 2.8201 MiB = 0.369636147 MB

depth 32: peak 4.2963 MiB = 0.563124634 MB

depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

depth 5:

get_contin: 0.11%

set_contin: 0.01%

update_particles: 0.02%

complexity_based_scorer: 91.94%

depth 16:

get_contin: 0.11%

set_contin: 0.02%

update_particles: 0.06%

complexity_based_scorer: 93.25%

depth 32:

get_contin: 0.12%

set_contin: 0.04%

update_particles: 0.15%

complexity_based_scorer: 91.06%

depth 128:

get_contin: 0.25%

set_contin: 0.16%

update_particles: 0.37%

complexity_based_scorer: 92.87%

3.3. Time (s):

Depth: 32 | Mean: 180

HC with reduction:4.1. Massif:

depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

depth 32:

get_contin: 0.42%

set_contin: 0%

hill_climbing: 0.08%

complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

Jul 30 '15 20:07 ArleyRistar

Hi Arley,

Could you cut-n-paste this into the directory moses/moses/diary/particle-swarm/ ? The other directories under "diary" provide similar information on experiments. They are rather informal; just notes, really, nothing particularly fancy.

On Mon, Jul 27, 2015 at 7:38 AM, arleyristar [email protected] wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_ or read below. Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

Methodology:

First, the experiments were divided in two groups: without reduction (-B 0 -E 1 parameters) and with reduction; it's separated this way because the reduction part affects the particle swarm (PS) too much (clean_combo_tree uses 99.73% with reduction in PS).

What do you mean "affects too much"? Do you mean "uses too much cpu time"? or does it actually change results?

You nmay want to call (maybe) reduct once, before the PSO step; clearly, you should not call reduct while doing PSO i.e. while trying different float values.

Besides that, the experiments were split in three parts: analysis of the memory, processing and time. The memory analysis was done using the valgrind tool massif running for 4 values of depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The processing analysis was done in the same way of the memory, but using the tool callgrind. The time analysis was done in a different way, using a simple python script to measure the mean time of seeds between 1 and 10 running between depth 2 and 32 for PS and HC. The dataset used was predicates.csv

predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to learn, and illustrates one of the big problems with the current contin learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use gradient-descent methods on, or maybe datasets that are easily solved by any kind of linear regression, for example, SVM. Currently, moses mostly cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear integer programming" problem, in moses, with PSO, would be very cool.

because it is a simple one that is already inside the examples folder of Moses; It was running for a maximum of 1000 evaluations, a small value because of the valgrind tools that increase the program time in 20x for massif and between 20x and 100x for callgring [1] but it's a sufficient amount of evaluations to test (HC in some seeds can find the best solution in less than 1000 evals). 2. Results description:2.1. Memory:

PS with reduction spend more memory than without.

Increasing in the depth of PS, increases the memory used, but not that much, sometimes it can decrease, because it can find better results that are less complex.

HC without reduction reaches the ram limit independently of depth.

HC with reduction uses far less memory, even less than PS without reduction.

2.2. Processing:

Variable length representation affects PS without reduction too much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).

Affects the PS with reduction in a similar way, but with a percentage much lower, because of the reduction.

The depth didn't affect hill climbing at all, mostly because it found a solution that needs less than depth 5.

2.3. Time:

PS without reduction is much quicker than with reduction.

HC without reduction is slower than with reduction.

PS without reduction is quicker than HC without reduction.

PS with reduction is very much slower than HC with reduction.

I don't quite understand these results. MOSES does reduction in several different places, for "completely unrelated" reasons. it seems to me that you should do reduction at least once, before starting PSO. Howver, during the PSO inner loop, where you are merely trying different floating point values, you should NOT do any reduction at all; it would be completely pointless. Right!?

Can you sketch the pseudocode for me?

2.4. Others:

Increasing the depth in PS, it returns better results, but it's stuck in the discrete maximum local anyway.

The results are not affected increasing the HC depth, maybe because this problem didn't need precision or a bigger value to reach the max. global.

Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with thousands of instances will not reach the amount of ram that the computers today have, and even if do, there's always feature selection; but let's take a look: Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each dimension in a deme. While a double representation uses 8B * num_instances. So using variable-length representation with a small depth for a larger number of instances still makes sense for RAM economy. But i'm still felling that this way could be better in 2 forms. 1- It uses 2 bits to represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or RRRRR if the get/set contin function limits the for with the depth too? 3.2. Processing:

With variable length representation, processing is just a problem with PS, and just if it have a bigger depth. In all cases the clean_combo_tree and merge_demes are the ones that use more processing and the specific part of HC or PS are small (PS is bigger just because of set_contin and get_contin). In the PS i'm having to call get_contin a lot of times (to get the best values), and it isn't helping. To solve it i'll have to store the values in doubles, that in the end will lose the benefices of using the variable length representation. 3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to work best with HC where each one in a eval are very close. In the initial phase of PS the instance are very different one of the other. I would say that PS just is running quicker without reduction because it returns a deme smaller than HC because it doesn't include the worst solutions. 3.4. General:

The variable length representation don't have that much impact in memory or performance, it just have when using PS and for complex problems. Even so, I still find better to change it to a double representation, to get a little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

PS without reduction:1.1. Massif:

depth 5: peak 1.2842 MiB = 0.168322662 MB

depth 16: peak 1.3022 MiB = 0.170681958 MB

depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions, and than, less complexity)

depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

depth 5:

get_contin: 1.95%

set_contin: 0.7%

update_particles: 3.17%

complexity_based_scorer: 80.65%

depth 16:

get_contin: 5.39%

set_contin: 1.97%

update_particles: 6.98%

complexity_based_scorer: 77.19%

depth 32:

get_contin: 11.02%

set_contin: 4.15%

update_particles: 14.16%

complexity_based_scorer: 77.30%

depth 128:

get_contin: 16.18%

set_contin: 10.34%

update_particles: 24.40%

complexity_based_scorer: 67.82%

1.3. Time (s):

Depth: 2 | Mean: 2.557880

Depth: 3 | Mean: 2.593005

Depth: 4 | Mean: 2.663667

Depth: 5 | Mean: 2.668498

Depth: 6 | Mean: 2.701040

Depth: 7 | Mean: 2.644231

Depth: 8 | Mean: 2.714792

Depth: 9 | Mean: 2.692005

Depth: 10 | Mean: 2.680957

Depth: 11 | Mean: 2.735071

Depth: 12 | Mean: 2.783216

Depth: 13 | Mean: 2.743773

Depth: 14 | Mean: 2.725607

Depth: 15 | Mean: 2.722347

Depth: 16 | Mean: 2.755987

Depth: 17 | Mean: 2.901231

Depth: 18 | Mean: 2.809483

Depth: 19 | Mean: 2.762999

Depth: 20 | Mean: 2.758817

Depth: 21 | Mean: 2.772569

Depth: 22 | Mean: 2.816113

Depth: 23 | Mean: 2.928664

Depth: 24 | Mean: 2.904253

Depth: 25 | Mean: 2.938958

Depth: 26 | Mean: 2.845138

Depth: 27 | Mean: 2.877145

Depth: 28 | Mean: 2.857758

Depth: 29 | Mean: 2.840159

Depth: 30 | Mean: 2.822597

Depth: 31 | Mean: 2.813350

Depth: 32 | Mean: 2.820256

HC without reduction:2.1. Massif:

depth 5: peak 53.1730 MiB = 6.96949146 MB

depth 16: peak 53.1730 MiB = 6.96949146 MB

depth 32: peak 53.1730 MiB = 6.96949146 MB

depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

depth 5:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 16:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.21%

depth 32:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.05%

complexity_based_scorer: 65.23%

depth 128:

get_contin: 0.78%

set_contin: 0%

hill_climbing: 0.06%

complexity_based_scorer: 65.20%

2.3. Time (s):

Depth: 2 | Mean: 13.933148

Depth: 3 | Mean: 13.710301

Depth: 4 | Mean: 13.771378

Depth: 5 | Mean: 13.699602

Depth: 6 | Mean: 13.643018

Depth: 7 | Mean: 13.953615

Depth: 8 | Mean: 14.218726

Depth: 9 | Mean: 14.607479

Depth: 10 | Mean: 14.577442

Depth: 11 | Mean: 14.953429

Depth: 12 | Mean: 14.705942

Depth: 13 | Mean: 14.993052

Depth: 14 | Mean: 14.541223

Depth: 15 | Mean: 14.249910

Depth: 16 | Mean: 14.563872

Depth: 17 | Mean: 15.361906

Depth: 18 | Mean: 14.469012

Depth: 19 | Mean: 14.755074

Depth: 20 | Mean: 14.411482

Depth: 21 | Mean: 14.821014

Depth: 22 | Mean: 14.412988

Depth: 23 | Mean: 14.660408

Depth: 24 | Mean: 14.989600

Depth: 25 | Mean: 14.081191

Depth: 26 | Mean: 14.491643

Depth: 27 | Mean: 14.121354

Depth: 28 | Mean: 14.673866

Depth: 29 | Mean: 14.079422

Depth: 30 | Mean: 13.745444

Depth: 31 | Mean: 13.860764

PS with reduction:3.1. Massif:

depth 5: peak 2.7437 MiB = 0.359622246 MB

depth 16: peak 2.8201 MiB = 0.369636147 MB

depth 32: peak 4.2963 MiB = 0.563124634 MB

depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

depth 5:

get_contin: 0.11%

set_contin: 0.01%

update_particles: 0.02%

complexity_based_scorer: 91.94%

depth 16:

get_contin: 0.11%

set_contin: 0.02%

update_particles: 0.06%

complexity_based_scorer: 93.25%

depth 32:

get_contin: 0.12%

set_contin: 0.04%

update_particles: 0.15%

complexity_based_scorer: 91.06%

depth 128:

get_contin: 0.25%

set_contin: 0.16%

update_particles: 0.37%

complexity_based_scorer: 92.87%

3.3. Time (s):

Depth: 32 | Mean: 180

HC with reduction:4.1. Massif:

depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

depth 32:

get_contin: 0.42%

set_contin: 0%

hill_climbing: 0.08%

complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time with results) you can just ask me, i'll not but it in the github, it's too big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000. I'll send, but I didn't know that were a cache in the evaluations so it appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@arleyristar https://github.com/arleyristar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-120932114.

— Reply to this email directly or view it on GitHub https://github.com/opencog/moses/issues/10#issuecomment-125191369.

Jul 30 '15 21:07 linas

moses moses copied to clipboard

Create a subclass of field_set without continuous representation as variable-length sequences of bits.

and have a less complex code at the cost of a little more ram and my work.

and have a less complex code at the cost of a little more ram and my work.

and have a less complex code at the cost of a little more ram and my work.

and have a less complex code at the cost of a little more ram and my work.

moses
moses copied to clipboard