SPORF icon indicating copy to clipboard operation
SPORF copied to clipboard

default mtry and "projection type" for low-dimensional data

Open jovo opened this issue 5 years ago • 2 comments

i never seriously considered defaults for really low dimensions. it occurs to me that it might make sense for mtry to never be much smaller than 100, and rather than trinary for continuous to be the default, otherwise, we don't get enough strength from the trees.

thoughts @falkben @MrAE @ttomita @jbrowne6 @megh1241

jovo avatar May 19 '19 22:05 jovo

This would make our defaults vastly different than sklearns (just pointing it out).

mtry at a high value makes it slow -- perhaps with low dimensions speed doesn't matter as much though

for low dimensions, the way we are currently sampling each mtry (w/ replacement), I think we would get a lot of duplicated feature combinations which wouldn't add anything but would slow us down.

we probably want to:

  1. fix it so that each mtry samples w/o replacement from the matrix p x d.
  2. add continuous rerf.
  3. run experiments to determine appropriate default mtry for low/med/high dimensions

On Sun, May 19, 2019 at 6:51 PM joshua vogelstein [email protected] wrote:

i never seriously considered defaults for really low dimensions. it occurs to me that it might make sense for mtry to never be much smaller than 100, and rather than trinary for continuous to be the default, otherwise, we don't get enough strength from the trees.

thoughts @falkben https://github.com/falkben @MrAE https://github.com/MrAE @ttomita https://github.com/ttomita @jbrowne6 https://github.com/jbrowne6 @megh1241 https://github.com/megh1241

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/neurodata/RerF/issues/270?email_source=notifications&email_token=AAE7NZ6ZE3V5FIQFBRP2GK3PWHKXPA5CNFSM4HN5RXK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUTPSXQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AAE7NZ2B7PGSBTHNFEMQPITPWHKXPANCNFSM4HN5RXKQ .

falkben avatar May 20 '19 14:05 falkben

yah, this would depend on continuous rerf, and yah, this is part of a larger effort to understand the hyperparameters...

On Mon, May 20, 2019 at 10:14 AM Ben Falk [email protected] wrote:

This would make our defaults vastly different than sklearns (just pointing it out).

mtry at a high value makes it slow -- perhaps with low dimensions speed doesn't matter as much though

for low dimensions, the way we are currently sampling each mtry (w/ replacement), I think we would get a lot of duplicated feature combinations which wouldn't add anything but would slow us down.

we probably want to:

  1. fix it so that each mtry samples w/o replacement from the matrix p x d.
  2. add continuous rerf.
  3. run experiments to determine appropriate default mtry for low/med/high dimensions

On Sun, May 19, 2019 at 6:51 PM joshua vogelstein < [email protected]> wrote:

i never seriously considered defaults for really low dimensions. it occurs to me that it might make sense for mtry to never be much smaller than 100, and rather than trinary for continuous to be the default, otherwise, we don't get enough strength from the trees.

thoughts @falkben https://github.com/falkben @MrAE https://github.com/MrAE @ttomita https://github.com/ttomita @jbrowne6 https://github.com/jbrowne6 @megh1241 https://github.com/megh1241

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/neurodata/RerF/issues/270?email_source=notifications&email_token=AAE7NZ6ZE3V5FIQFBRP2GK3PWHKXPA5CNFSM4HN5RXK2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GUTPSXQ , or mute the thread < https://github.com/notifications/unsubscribe-auth/AAE7NZ2B7PGSBTHNFEMQPITPWHKXPANCNFSM4HN5RXKQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/RerF/issues/270?email_source=notifications&email_token=AAAKG4XTPGLLBRDSBHXWICLPWKW3JA5CNFSM4HN5RXK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVY6ZKQ#issuecomment-494005418, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKG4VXZNXJJPUCLFJUAQ3PWKW3JANCNFSM4HN5RXKQ .

-- the glass is all full: half water, half air. neurodata.io

jovo avatar May 20 '19 20:05 jovo