pygmmis icon indicating copy to clipboard operation
pygmmis copied to clipboard

Cutoff: explanation/documentation

Open philastrophist opened this issue 5 years ago • 6 comments

In performing some tests of pygmmis I have found that varying the cutoff argument drastically changes the end result of fitting even with split-and-merge turned on (and exhaustive).

My understanding of EM is that the responsibilities r_ik are calculated for all data and all components. Why then, does pygmmis use a cutoff to fit only to those data in the neighbourhood of each component? As far as I can understand, cutoff!=inf simply means that it will be labelling some data as not belonging to any component.

Is the reason something to do with the background or is it just to avoid outliers?

Thanks

P.S. This code is very cool!

philastrophist avatar Jan 14 '19 14:01 philastrophist

The cutoff argument is meant to allow for speed-ups in cases of many components. It is unlikely that you will need all components for every sample, so setting e.g. cutoff=3 doesn't even attempt to fit samples outside of the 3-sigma region of a component. This works very well for data that are spread out a lot, and it also helps break degeneracies for many strongly overlapping components.

I realize that I should document this parameter better, you're not the first person to ask.

pmelchior avatar Jan 14 '19 14:01 pmelchior

Ah ok that makes sense, cutoff=None raises errors though, so I guess for now it's easier to just set cutoff=inf for my purposes.

philastrophist avatar Jan 15 '19 10:01 philastrophist

There shouldn't be errors with cutoff=None. Can you post the error and the traceback, please.

pmelchior avatar Jan 15 '19 21:01 pmelchior

Its an attribute error, trying to copy a None

ITER	SAMPLES	LOG_L	STABLE
0	5000	-2.383	3
Traceback (most recent call last):
  File "/local/home/sread/Apps/anaconda/envs/pymc3-uptodate/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-f61dddb7d343>", line 4, in <module>
    runfile('/local/home/sread/Dropbox/pygmmis/models.py', wdir='/local/home/sread/Dropbox/pygmmis')
  File "/local/home/sread/Apps/jetbrains-toolbox-1.4.2492/install_location/apps/PyCharm-P/ch-0/182.4129.5/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "/local/home/sread/Apps/jetbrains-toolbox-1.4.2492/install_location/apps/PyCharm-P/ch-0/182.4129.5/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/local/home/sread/Dropbox/pygmmis/models.py", line 122, in <module>
    split_n_merge=gmm.K * (gmm.K - 1) * (gmm.K - 2) / 2)
  File "/local/home/sread/Dropbox/pygmmis/pygmmis.py", line 689, in fit
    U_ = [U[k].copy() for k in xrange(gmm.K)]
  File "/local/home/sread/Dropbox/pygmmis/pygmmis.py", line 689, in <listcomp>
    U_ = [U[k].copy() for k in xrange(gmm.K)]
AttributeError: 'NoneType' object has no attribute 'copy'

philastrophist avatar Jan 17 '19 16:01 philastrophist

Can you post the call of pygmmis.fit as well please.

pmelchior avatar Jan 17 '19 18:01 pmelchior

Sure. It is here:

logL, U = pygmmis.fit(gmm, data, init_method='kmeans', w=0.01, cutoff=None, tol=1e-6, rng=rng, maxiter=1)

philastrophist avatar Jan 20 '19 13:01 philastrophist