htm.core icon indicating copy to clipboard operation
htm.core copied to clipboard

Performance Comparison to nupic

Open psteinroe opened this issue 5 years ago • 70 comments

Hi everyone,

thanks for the great work on htm.core! I am doing some research on leveraging HTM for anomaly detection and I am wondering wether I should use htm.core or nupic. Is there any comparison in terms of performance?

@breznak You described some issues in PR #15. What would you say - continue with htm.core or rather get nupic running? Its just a masters thesis so I won't be able to dive deep into htm to improve the implementation...

psteinroe avatar Apr 13 '20 16:04 psteinroe

htm.core is a rework of NuPic so that it will work. I think you will find that nearly everything you want to do with NuPic should be covered by htm.core. NuPic on the otherhand is basically broken.

@breznak will have to address performance of htm.core but I expect it to be comparable (or perhaps even better) than NuPic.

dkeeney avatar Apr 13 '20 17:04 dkeeney

@steinroe glad for your interest!

I am wondering wether I should use htm.core or nupic

In terms of feature-fullness and active support, htm.core has now much surpased its parent, numenta's nupic(.core).

Is there any comparison in terms of performance? [in anomaly detection]

yes, this is an open problem. Theoretically htm.core should be better/or same/or slightly worse to Numenta's nupic. We did a couple of improvements as well as a couple of "regressions" in the name of biological plausibility. You could start by looking at the Changelog.md, the git diff is already too wild.

Yet, the so-far performance on NAB of htm.core is much worse. (Note, such regression is not observed on "sine", or "hotgym" data.). My gut feeling is it's just "some parameter that is off".

I won't be able to dive deep into htm to improve the implementation...

you would'nt have to dive deep, or we'd be here to help you.. So my recommendation would be: Give htm.core a trial period (say 1-5 weeks) and try to find where the cuplit is. Doing so would help the community and would be a significant result for your thesis. I could help you in trialing down the process in NAB so we can locate the err.

breznak avatar Apr 13 '20 17:04 breznak

@breznak and @dkeeney thanks for the quick replies! I started by analysing the difference in the API and the respective parameters.

Regarding the Spatial Pooler, htm.core differentiates from nupic by three parameters: localAreaDensity, potentialRadius and stimulusThreshold. I tried some Bayesian Optimization to get an idea of what setting for these 3 might work well but only got a score of around 31 with {"localAreaDensity": 0.02149599864266627, "potentialRadius": 4148629.7213459704, "stimulusThreshold": 3.640494932522697}. All other settings were the same as in nupic. Do you have a gut feeling on what ranges would make sense? From the logs I would say localAreaDensity should be <0.09 and stimulusThreshold <10, but still the score is very bad. Any idea on that?

The second thing I did was a direct comparison of the anomaly scores yielded by nupic and by htm.core on the same datasets. You can find the output in the attached pdf. There are two things that aren’t right:

  • (1) It seems like the date encoding is not right or not used at all. The output shows no peak in e.g. the no jump dataset.
  • (2) The Scalar Encoding seems off, as e.g. in jumpsup a significant rise is not reflected at all. This could also be due to some parameters of the models I guess.

What would you say, is it rather because of the params of the encoders or params of the algorithm itself? Also, there are differences in the parameters of the RSDE encoder. Only the resolution parameter is also used in nupic. Or did I miss something here?

htm_impl_comparison.pdf

psteinroe avatar Apr 15 '20 18:04 psteinroe

Very nice analysis @steinroe !! :+1: Let me answer your ideas by parts.

started by analysing the difference in the API

first, let me correct my mistake, the file I wanted to point you to is API_CHANGELOG.md

About the API (and implementation differences): From top of my head...

  • our RDSE has slightly different representation (uses MurMur hash)
  • our DateTimeEncoder fixes some bugs in the original
    • there are pure python implementations of the encoders which are direct ports of the originals (just ported py2 to py3).
  • SP has the removed params as you discovered (there is a number of bugfixes and improvements) but the point is we're still passing the original test-suite (with known modifications), so the computations 99% are valid.
  • TM: ditto. We removed BacktrackingTM (which still has the best NAB scores), so results should be (atleast) comparable with numentaTM detector.
  • anomaly:
    • our TM now provides (raw) anomaly transparently.
    • Numenta's NAB detector uses AnomalyLikelihood (I think you can change it to raw). I'm quite sure our c++ AnomalyLikelihood is broken (well, untested at best), there's also the ./py/ Likelihood, which should be also just py3 port of the Nupic's.
    • NumentaDetector "cheats" by using/preprocessing "spatial anomaly". You can turn that off too. HtmcoreDetector does not use that.

tried some Bayesian Optimization to get an idea of what setting for these 3 might work well

cool. If you don't have own framework for optimization, I suggest looking at ./py/htm/optimization/

Do you have a gut feeling on what ranges would make sense?

This is tricky, but can be computed rather precisely. I'll have to look at this deeper again... You can see my trials at community/NAB on this very same problem: https://github.com/htm-community/NAB/pull/15

stimulusThreshold This is a number specifying the minimum number of synapses that must be active in order for a column to turn ON. The purpose of this is to prevent noisy input from activating columns.

So this depends on expected avg number of ON bits from encoder, number of synapses and range of input field each dendrite covers, how noisy is the problem. Anything >=1 is imho a good starting point.

potentialRadius This parameter deteremines the extent of the input that each column can potentially be connected to. This can be thought of as the input bits that are visible to each column, or a 'receptive field' of the field of vision. A large enough value will result in global coverage, meaning that each column can potentially be connected to every input bit. This parameter defines a square (or hyper square) area: a column will have a max square potential pool with sides of length (2 * potentialRadius + 1), rounded to fit into each dimension.

depends on size of the encoder/encoding. And whether you want the columns to act as local/global approximators.

= size aka Inf aka "act global".

"potentialRadius": 4148629.7213459704

Seems your optimizer prefered that variant. (note, it might thus also just got stuck in a local optima). I'd try something as "global" and "25% of input field" as reasonable defaults.

localAreaDensity The desired density of active columns within a local inhibition area (the size of which is set by the internally calculated inhibitionRadius, which is in turn determined from the average size of the connected potential pools of all columns). The inhibition logic will insure that at most N columns remain ON within a local inhibition area, where N = localAreaDensity * (total number of columns in inhibition area) Default: 0.05 (5%)

This is lower-bounded by SP's size aka numColumns. (density * numCols >= some meaningful min value). Its value determines TM's function and TM's variant of "stimulusThreshold" (it's called something different - activationThreshold and minThreshold). 2-15% seems a reasonable value to me.

The second thing I did was a direct comparison of the anomaly scores yielded by nupic and by htm.core on the same datasets.

great graphs!, some notes on that later.

This leads me to a decomposition. The chain is: Encoder -> SP -> TM -> Anomaly Ideally, we'd start from the top to verify all our implementations.

  • store Numenta TM's outputs, use our anomaly and compare, ...
  1. It seems like the date encoding is not right or not used at all. The output shows no peak in e.g. the no jump dataset.

actually, there's a small drop at that time. All the graphs are on the same params? Looks that only the "flatmiddle" dataset htmcore results can detect anything meaningful. So the decision value of the others is questionable. But it could be in settings of the datetime encoder, ratio of datetime / RDSE size, insensitivity of sp/tm.

(2) The Scalar Encoding seems off, as e.g. in jumpsup a significant rise is not reflected at all. This could also be due to some parameters of the models I guess.

The RDSE encoder is rather well tested, so I wouldn't expect a hidden bug there. Maybe unusable default params. It seems to me the the HTM didn't learn at all in most cases (except the "flatmiddle")

 What would you say, is it rather because of the params of the encoders or params of the algorithm itself? 

I think it'd be in the params. But cannot tell whether enc/sp/tm. It's all tied together.

Great job investigating so far. I'll be looking at it tonight as well. Please let us know if you find something else or if we can help explain something. Cheers,

breznak avatar Apr 15 '20 22:04 breznak

@steinroe I've added a "dataset" with only the artificial data, the results look rather good.

Update: When running on only the artificial/synthetic labels, our results are quite good: python run.py -d htmcore --detect --score --optimize --normalize --windowsFile labels/synthetic.json -n 8

htmcore detector benchmark scores written to /mnt/store/devel/HTM/NAB/results/htmcore/htmcore_reward_low_FN_rate_scores.csv

Running score normalization step
Final score for 'htmcore' detector on 'standard' profile = 84.84
Final score for 'htmcore' detector on 'reward_low_FP_rate' profile = 84.36
Final score for 'htmcore' detector on 'reward_low_FN_rate' profile = 88.37
Final scores have been written to /mnt/store/devel/HTM/NAB/results/final_results.json.

PS: also I'd suggest using the following branch, it has some nice prints and comments to it. https://github.com/htm-community/NAB/pull/15

breznak avatar Apr 16 '20 13:04 breznak

CC @Zbysekz as the author of HTMpandaVis, do you think you could help us debugging the issue? I'd really appreciate that!

TL;DR: minor parameters and changes were surely done to htm.core. Compared to Numenta's Nupic, our results on NAB really suck now. I'm guessing it should be matter of incorrect params.

There look into the representations with the visualizer would be really helpful. More info here (and linked posts) https://github.com/htm-community/NAB/pull/15

breznak avatar Apr 16 '20 14:04 breznak

Btw, I've made fixes to Jupyter plotter, see NAB/scripts/README. These are some figures: HTMcore: htmcore_nab

For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169499

Numenta: numenta_nab

For result file : ../results/numenta/artificialWithAnomaly/numenta_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 1
True Negative (Detected non anomalies) : 4027
False Positive (False alarms) : 0
False Negative (Anomaly not detected) : 0
Total data points : 4032
S(t)_standard score : 0.4999963147227

EDIT: fixed uploaded imgs

Note: both images look very similar. (I don't know why numenta scores look that bad here,must be some bug)

breznak avatar Apr 16 '20 14:04 breznak

@breznak Thanks for the detailed review!

You can turn that off too. HtmcoreDetector does not use that.

Sorry, I forgot to mention that I turned that off for both detectors - HTMCore and Nupic.

cool. If you don't have own framework for optimization, I suggest looking at ./py/htm/optimization/

I saw that right after I was done with my optimization. Going to try yours too very soon!

Thanks for the information on the parameters. I guess from your explanation this is very likely some parameters off. I will look into the default parameters of both to get complete comparison of what could be different.

I setup a repo with the stuff I used to plot both detectors against each other, as I didn't want to bother with all the NAB stuff for that. For the HTMCore detector I actually used yours from htm-community/NAB#15. For nupic, I setup a little server with a (very very) simple API to use the original nupic detector. I just removed the base class so I put the min/max stuff directly into the detector and removed the spatial anomaly detector in both. Feel free to play around: https://github.com/steinroe/htm.core-vs-nupic

When running on only the artificial/synthetic labels, our results are quite good:

That appears weird to me... I am going to look into the outputs of the scores in detail, thanks for that!!!!

There look into the representations with the visualizer would be really helpful.

True! I guess we could be able to find the differences in the param settings better.

psteinroe avatar Apr 16 '20 14:04 psteinroe

When running on only the artificial/synthetic labels, our results are quite good:

That appears weird to me... I am going to look into the outputs of the scores in detail, thanks for that!!!!

I've confirmed that on the "nojump" data, the err still persists. HTMcore does not detect anything, numenta does have a peak.

This could be 2 things:

  • low sensitivity (in params somewhere in encoder, SP, TM)
  • or too quickly adapting TM (learns the flat curve too fast, so it does re-learn in the nojump region).

breznak avatar Apr 16 '20 15:04 breznak

For nupic, I setup a little server with a (very very) simple API to use the original nupic detector.

This looks good! I might be interested for that for community/NAB to provide (old) numenta detectors. Numenta/NAB switched to docker for the old py2 support, and this seems a good way to interface that!

I'll try your repo, thanks :+1:

breznak avatar Apr 16 '20 15:04 breznak

I don't understand the plotters/summary code:

if Error is None:
    TP,TN,FP,FN = 0,0,0,0
    print(standard_score)
    for x in standard_score:
        if x > 0:
            TP +=1
        elif x == 0:
            TN +=1
        elif x == -0.11:
            FP +=1
        elif x == -1:
            FN +=1
    print("For result file : " + result_file)
    print("True Positive (Detected anomalies) : " + str(TP))
    print("True Negative (Detected non anomalies) : " + str(TN))
    print("False Positive (False alarms) : " + str(FP))
    print("False Negative (Anomaly not detected) : " + str(FN))
    print("Total data points : " + str(total_Count))
    print(detector_profile+" score : "+str(np.sum(standard_score)))
else:
    print(Error)
    print("Run from beginng to clear Error")

which gives

[-0.11       -0.11       -0.11       ... -0.10999362 -0.1099937
 -0.10999377]
For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_nojump.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2787
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17200961167651

But according to the img and raw_anomaly, there should be only a few FP! htm_nab_nojump

Bottom line, is our community/NAB (scorer) correct? We could just copy htmcore results into numenta/NAB and have them re-scored.

breznak avatar Apr 16 '20 15:04 breznak

Sorry for the late reply. I am currently in progress of doing an in-depth comparison between the parameters and there are definitely some differences. You can find the table in the Readme here: https://github.com/steinroe/htm.core-vs-nupic

While most differences could be easily resolved, I need your input on the params of the RDSE Encoder. While HTMCore does have size and sparsity, nupic has w, n and offset set with default parameters. The descriptions seem similar, however I am not sure if e.g. size and w (which is probably short for width) mean the same. Do you have an idea here @breznak ?

This is the relevant section of the table, sorry for its size. The value columns show the values which are set by the respective detectors in NAB. If the cell is empty, the default value is used.

HTMCore Nupic CPP
Attribute Description Value Default Attribute Description Value Default
size Member "size" is the total number of bits in the encoded output SDR. 400 0
sparsity Member "sparsity" is the fraction of bits in the encoded output which this encoder will activate. This is an alternative way to specify the member "activeBits". 0.1 0
resolution Member "resolution" Two inputs separated by greater than, or equal to the resolution are guaranteed to have different representations. 0.9 0 resolution A floating point positive number denoting the resolution of the output representation. Numbers within [offset-resolution/2, offset+resolution/2] will fall into the same bucket and thus have an identical representation. Adjacent buckets will differ in one bit. resolution is a required parameter. max(0.001, (maxVal - minVal) / numBuckets) -
activeBits Member "activeBits" is the number of true bits in the encoded output SDR.
radius Member "radius" Two inputs separated by more than the radius have non-overlapping representations. Two inputs separated by less than the radius will in general overlap in at least some of their bits. You can think of this as the radius of the input.
Category Member "category" means that the inputs are enumerated categories. If true then this encoder will only encode unsigned integers, and all inputs will have unique / non-overlapping representations. FALSE
numBuckets 130
seed Member "seed" forces different encoders to produce different outputs, even if the inputs and all other parameters are the same. Two encoders with the same seed, parameters, and input will produce identical outputs. The seed 0 is special. Seed 0 is replaced with a random number. 0 (random) seed 42 42
w Number of bits to set in output. w must be odd to avoid centering problems. w must be large enough that spatial pooler columns will have a sufficiently large overlap to avoid false matches. A value of w=21 is typical. 21
n Number of bits in the representation (must be > w). n must be large enough such that there is enough room to select new representations as the range grows. With w=21 a value of n=400 is typical. The class enforces n > 6*w. 400
name None
offset A floating point offset used to map scalar inputs to bucket indices. The middle bucket will correspond to numbers in the range [offset - resolution/2, offset + resolution/2). If set to None, the very first input that is encoded will be used to determine the offset. None
verbosity 0

psteinroe avatar Apr 17 '20 13:04 psteinroe

We could just copy htmcore results into numenta/NAB and have them re-scored.

Nice analysis!! Sounds like a good idea to try that out. I will do that after I am done with the parameter comparison.

psteinroe avatar Apr 17 '20 13:04 psteinroe

That's a good idea to write such a comparison of params/API. I'd like the result to be published as a part of the repo here :+1:

[RDSE Encoder] While HTMCore does have size and sparsity, nupic has w, n and offset

I can help on those:

activeBits = w
size = n 

I'm not 100% sure about offset without looking, but it'd be used in resultion imho.

breznak avatar Apr 17 '20 14:04 breznak

RDSE:

  • I think we could get the numenta's encoder running in py if needed. Either in history of this repo, or in the community/nupic.py repo with py3 port. Let me know if you think it'd help you and I can try to dig one up.
  • in the numenta detector, you see a "cheat" I've complained about: RDSE should encode arbitrary range of numbers. Unlike ScalarEncoder which is simpler and is for limited ranges. Numenta is slightly biased in NAB and compute global min/max of the dataset. This info is used to parametrize the encoder. I'm not sure if htmcore's RDSE has ability to construct from the known range (?). If we could, it'd be good to compare (ideally that eventually should not be used, but we're pin-pointing now). Or try with Scalar enc.

breznak avatar Apr 17 '20 14:04 breznak

I can help on those:

Thanks!

Let me know if you think it'd help you and I can try to dig one up.

Let me try with the new parameter settings first. If that does not help, that might be another way to check wether its the encoder.

in the numenta detector, you see a "cheat" I've complained about:

Yes I saw that and I am calculating the resolution for htmcore encoder the same way to have a fair comparison.

The second parameter that is new for htm.core is the localAreaDensity. Nupic also has that one but uses another param numActiveColumnsPerInhArea instead to control the density of the active columns:

When using this method, as columns learn and grow their effective receptive fields, the inhibitionRadius will grow, and hence the net density of the active columns will decrease. This is in contrast to the localAreaDensity method, which keeps the density of active columns the same regardless of the size of their receptive fields.

Why was this removed from htm.core?

psteinroe avatar Apr 17 '20 14:04 psteinroe

We could just copy htmcore results into numenta/NAB and have them re-scored.

Nice analysis!! Sounds like a good idea to try that out. I will do that after I am done with the parameter comparison.

I got the htmcore detector running with numenta/NAB.

  • results are slightly different
  • but still bad/ does not solve the issue.

...

breznak avatar Apr 17 '20 14:04 breznak

That's a good idea to write such a comparison of params/API. I'd like the result to be published as a part of the repo here 👍

I will create a PR once I am done :) Is the table format readable or should I rather make it textual?

psteinroe avatar Apr 17 '20 14:04 psteinroe

Is the table format readable or should I rather make it textual?

the table is good! Might decide to drop the unimportant ones (verbosity, name) for clarity. but that's just a detail.

breznak avatar Apr 17 '20 14:04 breznak

The second parameter that is new for htm.core is the localAreaDensity. Nupic also has that one but uses another param numActiveColumnsPerInhArea instead to control the density of the active

yes, I proposed the removal. The reasons were nice, but not crucial, and now I'm suspecting this could be a lead..

https://github.com/htm-community/htm.core/pull/549/

The motivation for localAreaDensity is HTM's presumption that layers produce output (SDR) with a relatively const sparsity. (+ just code cleanup).

One the other hand, "a 'strong' column's receptive field grows" is also a good biological concept.

  • could we (temp) hack-in the logic for numActiveColumnsPerInhArea ?
    • if not, I'd again get it back with some more work.

This would be a significant result, if one can be proven "better" (dominating) over the other.

breznak avatar Apr 17 '20 15:04 breznak

This would be a significant result, if one can be proven "better" (dominating) over the other.

As the original sp also has localAreaDensity I would propose to first try out the original nupic detector with localAreaDensity instead of numActiveColumnsPerInhArea to see wether it has such an impact. I guess that would be faster than bringing it back.

psteinroe avatar Apr 17 '20 15:04 psteinroe

Good News! Using the Numenta parameters with localAreaDensity of 0.1 I achieve a score of 49.9 on NAB. At least some improvement. Going to try out the swarm algorithm to optimise localAreaDensity now.

  "htmcore": {
       "reward_low_FN_rate": 54.3433173159184,
       "reward_low_FP_rate": 42.32359518695501,
       "standard": 49.96056087185252
   },

These are the params:

params_numenta_comparable = {
  "enc": {
    "value": {
        #"resolution": 0.9, calculate by max(0.001, (maxVal - minVal) / numBuckets) where numBuckets = 130
        "size": 400,
        "activeBits": 21
      },
    "time": {
        "timeOfDay": (21, 9.49),
      }
  },
  "sp": {
    # inputDimensions: use width of encoding
    "columnDimensions": 2048,
    # "potentialRadius": 999999, use width of encoding
    "potentialPct": 0.8,
    "globalInhibition": True,
    "localAreaDensity": 0.1,  # optimize this one
    "stimulusThreshold": 0,
    "synPermInactiveDec": 0.0005,
    "synPermActiveInc": 0.003,
    "synPermConnected": 0.2,
    "boostStrength": 0.0,
    "wrapAround": True,
    "minPctOverlapDutyCycle": 0.001,
    "dutyCyclePeriod": 1000,
  },
  "tm": {
    "columnDimensions": 2048,
    "cellsPerColumn": 32,
    "activationThreshold": 20,
    "initialPermanence": 0.24,
    "connectedPermanence": 0.5,
    "minThreshold": 13,
    "maxNewSynapseCount": 31,
    "permanenceIncrement": 0.04,
    "permanenceDecrement": 0.008,
    "predictedSegmentDecrement": 0.001,
    "maxSegmentsPerCell": 128,
    "maxSynapsesPerSegment": 128,
  },
  "anomaly": {
    "likelihood": {
      "probationaryPct": 0.1,
      "reestimationPeriod": 100
    }
  }
}

psteinroe avatar Apr 17 '20 15:04 psteinroe

another thing that I wondered about:

In the detecor code there is a fixed param 999999999 defined when setting the infos for tm and sp. Shouldn't this be encodingWidth? Or is it the potentialRadius?

self.tm_info = Metrics([self.tm.numberOfCells()], 999999999)

psteinroe avatar Apr 17 '20 17:04 psteinroe

param 999999999 defined when setting the infos for tm and sp. Shouldn't this be encodingWidth? Or is it the potentialRadius?

no, this is unimportant. the metric is only used of our info, does not affect the computation. It's not related to "width"/number of bits, but rather a time/steps used of EMA in the metric

breznak avatar Apr 17 '20 19:04 breznak

As the original sp also has localAreaDensity I would propose to first try out the original nupic detector with localAreaDensity instead of numActiveColumnsPerInhArea to see wether it has such an impact. I guess that would be faster than bringing it back.

Alright, it seems to have a significant impact. Running the Numenta detectors with localAreaDensity of 0.1 instead of the numActiveColumnsPerInhArea results in basically the same results as with the htmcore detector.

The only question remaining is now if tuning localAreaDensity increases the score or if numActiveColumnsPerInhArea is superior in general. What do you propose how to proceed @breznak ?

Here is the code: https://github.com/steinroe/NAB/tree/test_numenta_localAreaDensity

  "numenta": {
       "reward_low_FN_rate": 52.56449422487971,
       "reward_low_FP_rate": 49.94586314087259,
       "standard": 50.82949995800923
   },
   "numentaTM": {
       "reward_low_FN_rate": 52.56449422487971,
       "reward_low_FP_rate": 49.94586314087259,
       "standard": 50.82949995800923
   },

psteinroe avatar Apr 18 '20 06:04 psteinroe

Going to try out the swarm algorithm to optimise localAreaDensity now.

Used Bayesian optimization, but nevertheless these are the results:

    "htmcore": {
        "reward_low_FN_rate": 60.852121191220256,
        "reward_low_FP_rate": 45.428862226866734,
        "standard": 55.50231971786488
    },

for the following params:

parameters_numenta_comparable = {
        "enc": {
            "value": {
                # "resolution": 0.9, calculate by max(0.001, (maxVal - minVal) / numBuckets) where numBuckets = 130
                "size": 400,
                "activeBits": 21,
                "seed": 5,  # ignored for the final run
            },
            "time": {
                "timeOfDay": (21, 9.49),
            }
        },
        "sp": {
            # inputDimensions: use width of encoding
            "columnDimensions": 2048,
            # "potentialRadius": use width of encoding
            "potentialPct": 0.8,
            "globalInhibition": True,
            "localAreaDensity": 0.025049634479368352,  # optimize this one
            "stimulusThreshold": 0,
            "synPermInactiveDec": 0.0005,
            "synPermActiveInc": 0.003,
            "synPermConnected": 0.2,
            "boostStrength": 0.0,
            "wrapAround": True,
            "minPctOverlapDutyCycle": 0.001,
            "dutyCyclePeriod": 1000,
            "seed": 5, # ignored for the final run
        },
        "tm": {
            "columnDimensions": 2048,
            "cellsPerColumn": 32,
            "activationThreshold": 20,
            "initialPermanence": 0.24,
            "connectedPermanence": 0.5,
            "minThreshold": 13,
            "maxNewSynapseCount": 31,
            "permanenceIncrement": 0.04,
            "permanenceDecrement": 0.008,
            "predictedSegmentDecrement": 0.001,
            "maxSegmentsPerCell": 128,
            "maxSynapsesPerSegment": 128,
            "seed": 5,  # ignored for the final run
        },
        "anomaly": {
            "likelihood": {
                "probationaryPct": 0.1,
                "reestimationPeriod": 100
            }
        }
    }

I created a PR htm-community/NAB/pull/25 for the updated params.

These are the logs for seed fixed to 5, where I achieved a standard score of 60 as maximum. optimization_logs

@breznak What would you suggest how to proceed from here?

psteinroe avatar Apr 19 '20 08:04 psteinroe

Used Bayesian optimization, but nevertheless these are the results:

    "htmcore": {
        "reward_low_FN_rate": 60.852121191220256,
        "reward_low_FP_rate": 45.428862226866734,
        "standard": 55.50231971786488
    },

Wow, these are very nice results! I'm going to merge NAB.

"localAreaDensity": 0.025049634479368352, # optimize this one

Interestingly, this is what's claimed by the HTM theory (2%) as observed in the cortex.

What would you suggest how to proceed from here?

compared to the Numenta results:

  • this means we "beat" Numenta under the same conditions now, right?
  • we have worse FP ratio (too many detections), should work on that.
  • [ ] your Bayesian opt is multi-parametric? or just considering the select (localArea) param?

I'd suggest:

  • you try to tune the current score wrt the other params (there's lots of local optima and parameters are interleaved)
  • I'll revert and reintroduce the numActiveColsPerInhArea param (alternative to localArea)
    • ideally we achieve similar scores with localArea, but performance is paramount
    • if not, it's proof that the numActiveCols... is a crucial functionality
  • we should get NAB operational with our optimization framework for multi-param opt. So we can brute-force the parameter space.

breznak avatar Apr 19 '20 10:04 breznak

I think we should review if NAB is conceptually correct (as the metric and methodology)!

See our current results (I made a tiny change to your recently updated params in NAB/fixing_anomaly branch) htmcore_nab_good

That is almost perfect! Yet in the plot:

For result file : ../results/htmcore/artificialWithAnomaly/htmcore_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169499

  • either just the Plot summary FP/FN is wrong ?
  • or even the scorer in NAB? (but the score is quite good, ~90% in the "synthetic")
  • still the NAB windows (where anomaly is expected) are incorrect, (atleast/even for) the Artificial anomalies. See below (professionally drawn )

nab_methodology_window

breznak avatar Apr 19 '20 11:04 breznak

Q: do we want to keep comparable params & scores. Or just aim for the best score? Or both, separately?

breznak avatar Apr 19 '20 15:04 breznak

this means we "beat" Numenta under the same conditions now, right?

Yes, but they still win with numActiveColumnsPerInhArea set.

your Bayesian opt is multi-parametric? or just considering the select (localArea) param?

For this test I just optimised the localAreaDensity keeping the others constant but it can be multi parametric.

you try to tune the current score wrt the other params (there's lots of local optima and parameters are interleaved)

Alright, I will work on that. Do you have a feeling about which params may be important / optimizable?

I'll revert and reintroduce the numActiveColsPerInhArea param (alternative to localArea)

Perfect, thanks for your work!

we should get NAB operational with our optimization framework for multi-param opt. So we can brute-force the parameter space.

The "problem" is that your framework calls the optimization function in parallel, so my current setup with bayesian optimisation where I simply write to and read from a params.json file won't work. I will think about how to set that up that and come back to you as soon as I have a solution that works.

either just the Plot summary FP/FN is wrong ?

My gut says this may be the problem, as the scoring results seems fine. We should debug this.

Q: do we want to keep comparable params & scores. Or just aim for the best score? Or both, separately?

I would say we just aim for the best score, as even the bug fixes in this fork may influence the best param setting. I don't know if its useful to keep a second set of params as a comparison between the two would be only fair if both use the best possible params.

psteinroe avatar Apr 19 '20 15:04 psteinroe