Hmisc
Hmisc copied to clipboard
Cut2 odd behaviour
I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1
I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example.
library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)
x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
[1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6)
[13] [4,6) [4,6) [4,6) [4,6) 6 6 6
Levels: [1,3) 3 [4,6) 6
I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6
My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.
Feel free to use Github to create an edited version of cut2, run all the tests, and I'll strongly consider adding it to Hmisc with credit to you as a co-author. Frank
Frank E Harrell Jr Professor School of Medicine
Department of Biostatistics Vanderbilt University
On Mon, Sep 24, 2018 at 7:25 AM pabloacera [email protected] wrote:
I'm using R version 3.4.4 (2018-03-15), Hmisc package Hmisc_4.1-1 I want to use cut2 to bin a list of number (x) based on the number of occurrences of these numbers. I also want to use a threshold (m) that define a minimum number of elements in that bins. example. library(Hmisc) x = c(1,1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6) print(x) cut2(x, m=4)
x = 1 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 [1] [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) [1,3) 3 3 3 [4,6) [4,6) [13] [4,6) [4,6) [4,6) [4,6) 6 6 6 Levels: [1,3) 3 [4,6) 6
I have set 4 as the desired minimum number of elements in each bin. The bins that cut2 gives are: [1,3) 3 [4,6) 6
My question is, why does cut2 leaves 3 as a single bin, if there is just 3 observations?, and then it also leaves 6 as a single bin having 3 observations too. Wouldn't it has more sense to have [1,3) [3,5) [5,6] ?? as all the bins would have at least 4 observations. I am a bit confuse with it any input are appreciated. thanks fro your time.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fharrelfe%2FHmisc%2Fissues%2F95&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=8mX0%2BgElMi02F1Epl2cX9HXz4Bu0AYClebf41bZzaxQ%3D&reserved=0, or mute the thread https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABGO2klgJmEWhbiWNdQV8VeUGh-RU8wTks5ueM9TgaJpZM4W2jgV&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7C6f213a24c7234ff4733c08d62218e109%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636733887582198212&sdata=nBENLWA451o%2FcP7WxJocwIoB7bXLEGGRRHTaWlRxHrI%3D&reserved=0 .