SPORF where to split

upon finding a pair of points to split between, i think we should randomly split exactly on one of those two points (with equal prob of each). otherwise, an adversary could attack us there. also, i don't see any good reason not to, though we should probably first check.

@j1c @jasonkyuyim @jbrowne6 @ttomita what do you think?

Jul 26 '18 00:07 jovo

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? It would be useful to have a random mechanism in selecting the split point to thwart the adversary if he knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split as well.

Jul 26 '18 05:07 ghost

well, that is a fair point about an adversary, on the other hand, if we want to say that RF is invariant to monotonic transformations of the data, and we split based on the mean, then we are not invariant, but if we randomly split on one of the end points, we are.

On Thu, Jul 26, 2018 at 1:16 AM Jason Yim [email protected] wrote:

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? Unless the adversary knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/47#issuecomment-407978850, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcmVV33yfQiaB0UuW3j--8XJ1n7EIks5uKVCUgaJpZM4Vg_Yl .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding.

Jul 26 '18 17:07 jovo

My feeling is that splitting randomly on an end point will really hurt generalization, particularly for smaller sample sizes.

On Jul 26, 2018, at 1:14 PM, joshua vogelstein [email protected] wrote:

well, that is a fair point about an adversary, on the other hand, if we want to say that RF is invariant to monotonic transformations of the data, and we split based on the mean, then we are not invariant, but if we randomly split on one of the end points, we are.

On Thu, Jul 26, 2018 at 1:16 AM Jason Yim [email protected] wrote:

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? Unless the adversary knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/47#issuecomment-407978850, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcmVV33yfQiaB0UuW3j--8XJ1n7EIks5uKVCUgaJpZM4Vg_Yl .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jul 26 '18 20:07 tyler-tomita

Consider class 0 a 1d Gaussian centered at -1 and class 1 a Gaussian centered at +1. Suppose our sample consists of one data point from each class at each of the means. Furthermore suppose we build two tree. One tree makes a split on the left point and the other on the right point. Everything in the middle will have a class posterior estimate of 0.5. However if both trees has split down the middle then the forest would have a perfect estimate.

On Jul 26, 2018, at 1:14 PM, joshua vogelstein [email protected] wrote:

well, that is a fair point about an adversary, on the other hand, if we want to say that RF is invariant to monotonic transformations of the data, and we split based on the mean, then we are not invariant, but if we randomly split on one of the end points, we are.

On Thu, Jul 26, 2018 at 1:16 AM Jason Yim [email protected] wrote:

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? Unless the adversary knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/47#issuecomment-407978850, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcmVV33yfQiaB0UuW3j--8XJ1n7EIks5uKVCUgaJpZM4Vg_Yl .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jul 26 '18 20:07 tyler-tomita

It should definitely be an option but further tests should be done before deciding whether it’s the default.

On Jul 26, 2018, at 1:14 PM, joshua vogelstein [email protected] wrote:

well, that is a fair point about an adversary, on the other hand, if we want to say that RF is invariant to monotonic transformations of the data, and we split based on the mean, then we are not invariant, but if we randomly split on one of the end points, we are.

On Thu, Jul 26, 2018 at 1:16 AM Jason Yim [email protected] wrote:

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? Unless the adversary knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/47#issuecomment-407978850, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcmVV33yfQiaB0UuW3j--8XJ1n7EIks5uKVCUgaJpZM4Vg_Yl .

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Jul 26 '18 20:07 tyler-tomita

Agreed. No evidence either way yet.

On Thu, Jul 26, 2018, 4:19 PM Tyler Tomita [email protected] wrote:

It should definitely be an option but further tests should be done before deciding whether it’s the default.

On Jul 26, 2018, at 1:14 PM, joshua vogelstein [email protected] wrote:

well, that is a fair point about an adversary, on the other hand, if we want to say that RF is invariant to monotonic transformations of the data, and we split based on the mean, then we are not invariant, but if we randomly split on one of the end points, we are.

On Thu, Jul 26, 2018 at 1:16 AM Jason Yim [email protected] wrote:

I might be missing something (not sure what adversarial context this arose in) but isn't randomly selecting between the points kind of redundant if you're already sampling random projection matrices? Unless the adversary knows the random projection distribution but then you could argue the adversary could know the distribution that we use to select the point to split.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/neurodata/R-RerF/issues/47#issuecomment-407978850 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AACjcmVV33yfQiaB0UuW3j--8XJ1n7EIks5uKVCUgaJpZM4Vg_Yl

.

-- the glass is all full: half water, half air. neurodata.io ps - i am committed to responding to my emails, it often takes about a week. thank you for understanding. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/neurodata/R-RerF/issues/47#issuecomment-408222334, or mute the thread https://github.com/notifications/unsubscribe-auth/AACjcsJW89uKDLhGy5SFyOIHhiLVXbriks5uKiQXgaJpZM4Vg_Yl .

Jul 26 '18 21:07 jovo

SPORF SPORF copied to clipboard

where to split

SPORF
SPORF copied to clipboard