feature-extraction-for-CERT-insider-threat-test-datasets icon indicating copy to clipboard operation
feature-extraction-for-CERT-insider-threat-test-datasets copied to clipboard

Got an error on feature_extraction.py

Open AlvaroSanchezM opened this issue 11 months ago • 2 comments

Hi, I tried to run the code on r4.2 but I got the following error:

my.user@computer1:~/folder/detect-anomalies/r4.2$ python3 feature_extraction.py Traceback (most recent call last): File "feature_extraction.py", line 937, in combine_by_timerange_pandas(dname) File "feature_extraction.py", line 118, in combine_by_timerange_pandas df = add_action_thisweek(act, columns, lines, act_handles, week_index, stop, firstdate, dname=dname) File "feature_extraction.py", line 78, in add_action_thisweek df.drop('id',1, inplace = True) TypeError: drop() takes from 1 to 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given

I executed it with python3.8.10. Could that be the problem?

AlvaroSanchezM avatar Mar 08 '24 16:03 AlvaroSanchezM

I installed scikit-learn ($ pip install scikit-learn), because it wasn't installed in the first execution. I checked I had pandas, joblib and numpy.

my.user@computer1:~/folder/detect-anomalies/r4.2$ pip install pandas Requirement already satisfied: pandas in /usr/local/lib/python3.8/dist-packages (2.0.1) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from pandas) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.8/dist-packages (from pandas) (2023.3) Requirement already satisfied: numpy>=1.20.3; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from pandas) (1.24.3) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.8/dist-packages (from pandas) (2023.3) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas) (1.14.0) my.user@computer1:~/folder/detect-anomalies/r4.2$ pip install joblib Requirement already satisfied: joblib in /home/my.user/.local/lib/python3.8/site-packages (1.3.2) my.user@computer1:~/folder/detect-anomalies/r4.2$ pip install scikit-learn Requirement already satisfied: scikit-learn in /home/my.user/.local/lib/python3.8/site-packages (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /home/my.user/.local/lib/python3.8/site-packages (from scikit-learn) (3.3.0) Requirement already satisfied: scipy>=1.5.0 in /home/my.user/.local/lib/python3.8/site-packages (from scikit-learn) (1.10.1) Requirement already satisfied: joblib>=1.1.1 in /home/my.user/.local/lib/python3.8/site-packages (from scikit-learn) (1.3.2) Requirement already satisfied: numpy<2.0,>=1.17.3 in /usr/local/lib/python3.8/dist-packages (from scikit-learn) (1.24.3) my.user@computer1:~/folder/detect-anomalies/r4.2$ pip install numpy Requirement already satisfied: numpy in /usr/local/lib/python3.8/dist-packages (1.24.3)

Then I executed feature_extraction.py with python3.10 in a clean dataset (trashed the previous one). I got this other error.

my.user@computer1:~/folder/detect-anomalies/r4.2$ python3.10 feature_extraction.py Traceback (most recent call last): File "/home/my.user/folder/detect-anomalies/r4.2/feature_extraction.py", line 7, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

AlvaroSanchezM avatar Mar 11 '24 14:03 AlvaroSanchezM

I noticed that when I installed the libraries in python, they installed in the default version (3.8) instead of in the one I want to use (3.10). After correctly installing them, I tried to execute feature_extraction.py but got the same error as in the first post of this issue:

my.user@computer1:~/folder/detect-anomalies/r4.2$ python3.10 feature_extraction.py Traceback (most recent call last): File "/home/my.user/folder/detect-anomalies/r4.2/feature_extraction.py", line 924, in [os.mkdir(x) for x in ["tmp", "ExtractedData", "DataByWeek", "NumDataByWeek"]] File "/home/my.user/folder/detect-anomalies/r4.2/feature_extraction.py", line 924, in [os.mkdir(x) for x in ["tmp", "ExtractedData", "DataByWeek", "NumDataByWeek"]] FileExistsError: [Errno 17] File exists: 'tmp' my.user@computer1:~/folder/detect-anomalies/r4.2$

AlvaroSanchezM avatar Mar 18 '24 09:03 AlvaroSanchezM

the last error you have is from File exists: 'tmp'. You can delete the tmp folder and try again.

For the packages, the code was written several years ago, so newer changes (e.g. newer python version, or newer package versions) may break it. Sorry I don't have a requirements.txt and I don't have time to rerun it now.

lcd-dal avatar Jun 23 '24 15:06 lcd-dal