data-science-ipython-notebooks icon indicating copy to clipboard operation
data-science-ipython-notebooks copied to clipboard

Data preprocessing

Open amira-yahlali opened this issue 1 year ago • 7 comments

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

amira-yahlali avatar Mar 02 '23 10:03 amira-yahlali

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

algopy avatar Mar 02 '23 10:03 algopy

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

amira-yahlali avatar Mar 02 '23 12:03 amira-yahlali

columns represent and if the null value in each columns is a normal value

or is it a missing value need to see your data to identify theses points ?

i'm trying to preprocess

On Thu, Mar 2, 2023 at 6:07 PM amira-yahlali @.***> wrote:

Ok, what's your objective ?

On Thu, Mar 2, 2023, 15:58 amira-yahlali @.***> wrote:

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

— Reply to this email directly, view it on GitHub #95 https://github.com/donnemartin/data-science-ipython-notebooks/issues/95, or unsubscribe

https://github.com/notifications/unsubscribe-auth/AMQQRCQQ3UD2DHFC2J2VBX3W2BY6VANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

I just need an understanding of what the columns represent and if the null value in each columns is a normal value or is it a missing value i'm trying to preprocess my data and like minimize it

— Reply to this email directly, view it on GitHub https://github.com/donnemartin/data-science-ipython-notebooks/issues/95#issuecomment-1451801742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMQQRCWVTJXIJEV54HXN4TTW2CH6ZANCNFSM6AAAAAAVNF5SK4 . You are receiving this because you commented.Message ID: @.*** com>

algopy avatar Mar 02 '23 15:03 algopy

My data is the cic-ids-collection on kaggle using class label as target dropping label and the rest is features i'd love to send you my notebook directly to make it easier for you

amira-yahlali avatar Mar 02 '23 17:03 amira-yahlali

Hi, Is this issue still open? I am looking forward to working on it. Thanks, Anmol Arora

AnmolArora15 avatar Jan 31 '24 21:01 AnmolArora15

I'm trying to clean my data and do some preprocessing but i don't have much understanding of the Columns if the zero in them are normal or missing values i'm using the dataset cic-collection on kaggle if any expert would help i'd be much thankful

see brother, if u want to remove the columns having all the null values/missing values you can use : data.drop(colums=[' ',' ' ] , inplace=true) in order to remove those columns

if u want to check the columns with number of non null values you can use data.info() to have precise understanding for the data .

if order to check the outliers in the data you can use seaborn library and import pairplot fucntion i.e seaborn.pairplot in oder to have graph depicting the outliers .

Regards

HeerakKashyap avatar Aug 12 '24 05:08 HeerakKashyap