Hmisc icon indicating copy to clipboard operation
Hmisc copied to clipboard

Cut2 odd behaviour

Open pabloacera opened this issue 6 years ago • 1 comments

I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1 I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example. library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)

x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 [1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6) [13] [4,6) [4,6) [4,6) [4,6) 6 6 6
Levels: [1,3) 3 [4,6) 6

I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6

My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.

pabloacera avatar Sep 24 '18 12:09 pabloacera

Feel free to use Github to create an edited version of cut2, run all the tests, and I'll strongly consider adding it to Hmisc with credit to you as a co-author. Frank

Frank E Harrell Jr Professor School of Medicine

Department of Biostatistics Vanderbilt University

On Mon, Sep 24, 2018 at 7:25 AM pabloacera [email protected] wrote:

I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1 I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example. library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)

x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 [1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6) [13] [4,6) [4,6) [4,6) [4,6) 6 6 6 Levels: [1,3) 3 [4,6) 6

I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6

My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fharrelfe%2FHmisc%2Fissues%2F95&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=8mX0%2BgElMi02F1Epl2cX9HXz4Bu0AYClebf41bZzaxQ%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABGO2klgJmEWhbiWNdQV8VeUGh-RU8wTks5ueM9TgaJpZM4W2jgV&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=nBENLWA451o%2FcP7WxJocwIoB7bXLEGGRRHTaWlRxHrI%3D&reserved=0 .

harrelfe avatar Sep 24 '18 15:09 harrelfe