eis_toolkit Add cnn

ok I make some test here I will try to use much as possible already implemented function in the eis toolkit

Dec 06 '23 07:12 zelioluca

Hey Luca, it seems you pushed over 1000 files to this branch. Please remove them, it's way too much/heavy to be included in GitHub / on other's computers.

Dec 06 '23 09:12 nmaarnio

Hey Luca, it seems you pushed over 1000 files to this branch. Please remove them, it's way too much/heavy to be included in GitHub / on other's computers.

sorrry man! I m trying to push some windows for the testing phase

Dec 06 '23 09:12 zelioluca

I removed the 90%

Dec 06 '23 09:12 zelioluca

I removed the 90%

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

From: Niko Aarnio @.> Sent: Wednesday, December 6, 2023 11:09:00 AM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

Hey Luca, it seems you pushed over 1000 files to this branch. Please remove them, it's way too much/heavy to be included in GitHub / on other's computers.

— Reply to this email directly, view it on GitHubhttps://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1842474310, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3KUFMXHJSJMNDHAEWMA53YIAY2ZAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBSGQ3TIMZRGA. You are receiving this because you authored the thread.Message ID: @.***>

Dec 06 '23 09:12 zelioluca

I will think about couple of change and after i am ready :-)

Dec 06 '23 09:12 zelioluca

This is still far too many binary files to be added to the master branch. You should attempt to create the

X = np.array(dataset)
y = np.array(labels)

by example loading a csv with the contents. Loading the sample data could also be done in a pytest.fixture to reduce test collection time if the sample data is large:

import pytest
@pytest.fixture
def sample_cnn_data():
    df = pd.read_csv("tests/prediction/sample.csv")
    X = df["X"].to_numpy()
    y = df["y"].to_numpy()
    return X, y

def test_do_classification(sample_cnn_data):
    """Test for classification."""
    X, y = sample_cnn_data
    best_model, cm = train_and_predict_for_classification(
    ...

Secondly, the added tests take far too long to run. Testing machine learning functions fast is difficult of course but we should not add very slow tests to the toolkit as that will slow the development cycle for everyone when tests are ran. It took circa 60 seconds for me to run the tests in cnn_classification_and_regression_test.py.

Dec 18 '23 09:12 nialov

Okkkkk perfect so the plan is to add the windows via csv file and make it faster 🤪 couple of days and will be ready.

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

From: Nikolas Ovaskainen @.> Sent: Monday, December 18, 2023 10:52:27 AM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

This is still far too many binary files to be added to the master branch. You should attempt to create the

X = np.array(dataset) y = np.array(labels)

by example loading a csv with the contents. Loading the sample data could also be done in a pytest.fixture to reduce test collection time if the sample data is large:

import pytest @pytest.fixture def sample_cnn_data(): df = pd.read_csv("tests/prediction/sample.csv") X = df["X"].to_numpy() y = df["y"].to_numpy() return X, y

def test_do_classification(sample_cnn_data): """Test for classification.""" X, y = sample_cnn_data best_model, cm = train_and_predict_for_classification( ...

Secondly, the added tests take far too long to run. Testing machine learning functions fast is difficult of course but we should not add very slow tests to the toolkit as that will slow the development cycle for everyone when tests are ran. It took circa 60 seconds for me to run the tests in cnn_classification_and_regression_test.py.

— Reply to this email directly, view it on GitHubhttps://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1859965844, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3KUFNTNJUXC3ACKJ2FIMTYKAG5XAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJZHE3DKOBUGQ. You are receiving this because you authored the thread.Message ID: @.***>

Dec 18 '23 09:12 zelioluca

Now i get it! I added that binary just for tests!!! i can remove it and create something that make windows from csv file!

Dec 18 '23 10:12 zelioluca

I noticed that you are doing another MLP... should I do not do anything anymore there?

Dec 18 '23 12:12 zelioluca

Do you mean if #252 is implementing the same stuff? What does @nmaarnio think? I am not too familiar with the concepts.

Dec 18 '23 14:12 nialov

@nialov What you think if I put two miny np array dump istead? they are less than 200kb?

Dec 18 '23 14:12 zelioluca

Do you mean if #252 is implementing the same stuff? What does @nmaarnio think? I am not too familiar with the concepts.

no no there was a miss understanding.. we were doing the same stuff!

Dec 18 '23 15:12 zelioluca

Do you mean if #252 is implementing the same stuff? What does @nmaarnio think? I am not too familiar with the concepts.

and When I wrote this I was thinking that you were Niko sorry :-)

Dec 18 '23 15:12 zelioluca

Yes, I was under the impression that you were doing just CNN Luca and that my job would be to finalize the MLP together with Micha. I created a new branch and PR (the #252 that was mentioned before) since the old one got so cluttered and messy.

When it comes to this CNN implementation you are working on, it should be cleaned to resemble the MLP version that will be merged and use the same utility functions if possible. The MLP is almost finished, so you could already take a look. Also, I remember that last time we had a modeling meeting we discussed that the architecture of the CNN needs to be more defined than it was at the time of the meeting. I haven't looked at this one thoroughly yet, but have you considered our discussions and how is this CNN different than the previous one? Anyway, I think we need @msmiyels to give his comments about this PR before we can merge.

Dec 18 '23 15:12 nmaarnio

Yes yes i change a lot from last implementation! The first mlp was little bit messy and the second one called mlp-classification-regression wad ok!!!! The big difference between the two is that i added conv layer and removed data fusion, it was too complex! Thank you that you helping me and also you liked my mlp ❤️❤️❤️ yes @msmiyels we need you expertise

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

From: Niko Aarnio @.> Sent: Monday, December 18, 2023 4:50:58 PM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

Yes, I was under the impression that you were doing just CNN Luca and that my job would be to finalize the MLP together with Micha. I created a new branch and PR (the #252https://github.com/GispoCoding/eis_toolkit/pull/252 that was mentioned before) since the old one got so cluttered and messy.

When it comes to this CNN implementation you are working on, it should be cleaned to resemble the MLP version that will be merged and use the same utility functions if possible. The MLP is almost finished, so you could already take a look. Also, I remember that last time we had a modeling meeting we discussed that the architecture of the CNN needs to be more defined than it was at the time of the meeting. I haven't looked at this one thoroughly yet, but have you considered our discussions and how is this CNN different than the previous one? Anyway, I think we need @msmiyelshttps://github.com/msmiyels to give his comments about this PR before we can merge.

— Reply to this email directly, view it on GitHubhttps://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1860872029, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3KUFI6VGUPOJBV5BJFKSDYKBQ6FAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRQHA3TEMBSHE. You are receiving this because you authored the thread.Message ID: @.***>

Dec 18 '23 16:12 zelioluca

Ah yesterday I forget to tell you that my latest version of mlp is here https://github.com/GispoCoding/eis_toolkit/blob/add_MLP_probability/eis_toolkit/prediction/mlp_classification_and_regression.py

On Mon, 18 Dec 2023 at 17:59, luca zelioli @.***> wrote:

Yes yes i change a lot from last implementation! The first mlp was little bit messy and the second one called mlp-classification-regression wad ok!!!! The big difference between the two is that i added conv layer and removed data fusion, it was too complex! Thank you that you helping me and also you liked my mlp ❤️❤️❤️ yes @msmiyels we need you expertise

Sent from Outlook for Android https://aka.ms/AAb9ysg

From: Niko Aarnio @.> Sent: Monday, December 18, 2023 4:50:58 PM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author < @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

Yes, I was under the impression that you were doing just CNN Luca and that my job would be to finalize the MLP together with Micha. I created a new branch and PR (the #252 https://github.com/GispoCoding/eis_toolkit/pull/252 that was mentioned before) since the old one got so cluttered and messy.

When it comes to this CNN implementation you are working on, it should be cleaned to resemble the MLP version that will be merged and use the same utility functions if possible. The MLP is almost finished, so you could already take a look. Also, I remember that last time we had a modeling meeting we discussed that the architecture of the CNN needs to be more defined than it was at the time of the meeting. I haven't looked at this one thoroughly yet, but have you considered our discussions and how is this CNN different than the previous one? Anyway, I think we need @msmiyels https://github.com/msmiyels to give his comments about this PR before we can merge.

— Reply to this email directly, view it on GitHub https://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1860872029, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3KUFI6VGUPOJBV5BJFKSDYKBQ6FAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRQHA3TEMBSHE . You are receiving this because you authored the thread.Message ID: @.***>

Dec 19 '23 07:12 zelioluca

Thanks for the update! I will try to do the change asap

On Tue, 6 Feb 2024, 10:43 Niko Aarnio, @.***> wrote:

@.**** requested changes on this pull request.

Hey,

I reviewed this if you (Luca/Javad) are still up to continue working on it.

So the biggest issue still is that EIS Toolkit style is not followed as rigorously as needed. Please be careful to use clear and informative names, avoiding typos, capitalizing when needed etc.

Regarding the "additional" stuff found in this PR: You should remove OHE and the cross-validation functions at least since they are defined already in EIS Toolkit. We do not have our own wrapper for normalizing, so that one can be done if you want to – just put it elsewhere and adhere to style with that too.

The first thing you should do if you continue to work on this is to merge master branch. After that you can utilize the OHE and CV tools, and take a look at current contents of prediction directory. By studying how we decided to structure and implement MLP and other modeling modules, you will get a better idea what we require for this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387526 :

+def normalize_the_data(scaler_agent: sklearn.preprocessing, data: np.ndarray):
"""

Normalize multi-dimensional data using a specified scaler.

This function applies a scaling technique to each feature in the data, transforming the data to a specified range or distribution. It is particularly useful for normalizing image data or similarly structured multi-dimensional data.

Parameters:
   - scaler_agent: An instance of a preprocessing scaler from sklearn.preprocessing (e.g., MinMaxScaler, StandardScaler) that will be used to apply normalization.
   - data: A NumPy ndarray containing the data to be normalized. The data is expected to be in a multi-dimensional format, typically including dimensions for sample size, height, width, and channels (e.g., for image data).
Returns:
   - normalized_data: The input data normalized according to the scaler_agent. The output data maintains the original shape of the input data.
"""

number_of_sample, h, w, c = data.shape

temp = scaler_agent.transform(data.reshape(-1, data.shape[-1]))

normalized_data = temp.reshape(number_of_sample, h, w, c)

return normalized_data
If you want to implement a normalizing function, please put it into its own file and under transformations module. Also, we don't want users to have to deal with any sklearn classes if possible, so don't put any such stuff into parameters. Let's keep it as functional style and modular as possible.

Please also add return type annotation, change the one letter variable names to something more descriptive and convert the docstring to match EIS Toolkit conventions/style. I would also consider renaming this function to "normalize".

I should point out still that it's not mandatory to implement this function as part of this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387914 :

+def make_one_hot_encoding(labels: np.ndarray):
"""
Convert categorical labels to a one-hot encoded format.

This function transforms a list of numerical or categorical labels into a one-hot encoded matrix, which is a binary matrix representing the categorization of each label. One-hot encoding is commonly used in machine learning to handle categorical features.

Parameters:
   - labels: A NumPy ndarray containing the labels to be encoded. Each label corresponds to a category or class.
Returns:
   - label_encoded: A matrix where each row corresponds to a one-hot encoded representation of the input labels.
Raises:
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
We have our own one-hot encoding function already, so remove this.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479390618 :
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
@.*** +def do_the_cnn(

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391037 :

+def do_the_cnn(
input_shape_for_cnn: Union[tuple[int, int, int], tuple[int, int]],

convolution_kernel_size: tuple[int, int],

conv_list: list[int] = [8, 16],

pool_size: int = 2,

neuron_list: list[int] = [16],

dropout_rate: Union[None, float] = None,

last_activation: Literal["softmax", "sigmoid"] = "softmax",

regularization: Union[tf.keras.regularizers.L1, tf.keras.regularizers.L2, tf.keras.regularizers.L1L2, None] = None,

data_augmentation: bool = False,

optimizer: str = "Adam",

loss=Union[tf.keras.losses.BinaryCrossentropy, tf.keras.losses.CategoricalCrossentropy],

output_units=2, +) -> tf.keras.Model:

"""
Do an instance of CNN.
Indentation ⬇️ Suggested change
Do an instance of CNN.
Do an instance of CNN.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391902 :

Parameters:
  input_shape_for_cnn: shape of the input.
  convolution_kernel_size: size of the kernel (usually is the last number of the input tuple)
  pool_size: size of the pooling layer
  conv_list: list of unit for the conv layers.
  regularization: Type of regularization to insert into the CNN or MLP.
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Descriptions should start with a capital letter and end with period.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479392742 :
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Return:
   return the compiled model.
Raises:
   InvalidArgumentTypeException: one of the arguments is invalid.
   CNNException: raised when the input is not valid
"""

if regression and binary we can not uses more than 1
No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393226 :

check that the input is not null
if input_shape_for_cnn is None:
   raise CNNException
if len(neuron_list) <= 0 or dropout_rate <= 0:
   raise InvalidArgumentTypeException
generate the input

input_layer = tf.keras.Input(shape=input_shape_for_cnn)

if data_augmentation is not False:
   input_layer = tf.keras.layers.RandomRotation((-0.2, 0.5))(input_layer)
we do dynamically the conv2d

for conv_layer_number, neuron in enumerate(conv_list):
   print(conv_layer_number)
Remove prints

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393411 :
           kernel_size=convolution_kernel_size,
       )(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
   body = tf.keras.layers.BatchNormalization()(body)
   body = tf.keras.layers.MaxPool2D(pool_size=pool_size)(body)
for layer_number, neuron in enumerate(neuron_list):
   body = tf.keras.layers.Dense(neuron, kernel_regularizer=regularization, activation="relu")(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
we flatten
⬇️ Suggested change

we flatten

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393624 :
we flatten

body = tf.keras.layers.Flatten()(body)

create the model

classifier = tf.keras.layers.Dense(
   output_units, activation=last_activation, kernel_regularizer=regularization, name="classifier"
)(body)

create the model

model = tf.keras.Model(inputs=input_layer, outputs=classifier, name="the_mlp_model")

model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])

return model
+# now let's prepare two mega function one for classification and one for regression

⬇️ Suggested change

-# now let's prepare two mega function one for classification and one for regression

On eis_toolkit/prediction/model_performance_estimation.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479395062 :

Remove this, we have cross-validation implementation already

On tests/prediction/cnn_classification_and_probability_test.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479396489 :

Testing needs to be more extensive. Look at mlp_test.py for examples.

In eis_toolkit/exceptions.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479397400 :

+class CNNException(Exception):

"""Exception throws when something is invalid in the cnn."""

+class CNNRunningParameterException(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidCrossValidationSelected(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidNumberOfSplit(Exception):

"""Exception throws when running parameters are wrong."""

I don't think we need any of these exception types to be added. The ones already should suffice. Please take a look at what other prediction functions are currently using in EIS Toolkit.

— Reply to this email directly, view it on GitHub https://github.com/GispoCoding/eis_toolkit/pull/251#pullrequestreview-1864541715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3KUFN3J77XQHIADC5G55LYSHUMTAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNRUGU2DCNZRGU . You are receiving this because you authored the thread.Message ID: @.***>

Feb 06 '24 08:02 zelioluca

Niko there is something strange with the branch! Of course I did pull before start to work, but now I got rejected because seems that my branch is behind

On Tue, 6 Feb 2024 at 10:43, Niko Aarnio @.***> wrote:

@.**** requested changes on this pull request.

Hey,

I reviewed this if you (Luca/Javad) are still up to continue working on it.

So the biggest issue still is that EIS Toolkit style is not followed as rigorously as needed. Please be careful to use clear and informative names, avoiding typos, capitalizing when needed etc.

Regarding the "additional" stuff found in this PR: You should remove OHE and the cross-validation functions at least since they are defined already in EIS Toolkit. We do not have our own wrapper for normalizing, so that one can be done if you want to – just put it elsewhere and adhere to style with that too.

The first thing you should do if you continue to work on this is to merge master branch. After that you can utilize the OHE and CV tools, and take a look at current contents of prediction directory. By studying how we decided to structure and implement MLP and other modeling modules, you will get a better idea what we require for this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387526 :

+def normalize_the_data(scaler_agent: sklearn.preprocessing, data: np.ndarray):
"""

Normalize multi-dimensional data using a specified scaler.

This function applies a scaling technique to each feature in the data, transforming the data to a specified range or distribution. It is particularly useful for normalizing image data or similarly structured multi-dimensional data.

Parameters:
   - scaler_agent: An instance of a preprocessing scaler from sklearn.preprocessing (e.g., MinMaxScaler, StandardScaler) that will be used to apply normalization.
   - data: A NumPy ndarray containing the data to be normalized. The data is expected to be in a multi-dimensional format, typically including dimensions for sample size, height, width, and channels (e.g., for image data).
Returns:
   - normalized_data: The input data normalized according to the scaler_agent. The output data maintains the original shape of the input data.
"""

number_of_sample, h, w, c = data.shape

temp = scaler_agent.transform(data.reshape(-1, data.shape[-1]))

normalized_data = temp.reshape(number_of_sample, h, w, c)

return normalized_data
If you want to implement a normalizing function, please put it into its own file and under transformations module. Also, we don't want users to have to deal with any sklearn classes if possible, so don't put any such stuff into parameters. Let's keep it as functional style and modular as possible.

Please also add return type annotation, change the one letter variable names to something more descriptive and convert the docstring to match EIS Toolkit conventions/style. I would also consider renaming this function to "normalize".

I should point out still that it's not mandatory to implement this function as part of this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387914 :

+def make_one_hot_encoding(labels: np.ndarray):
"""
Convert categorical labels to a one-hot encoded format.

This function transforms a list of numerical or categorical labels into a one-hot encoded matrix, which is a binary matrix representing the categorization of each label. One-hot encoding is commonly used in machine learning to handle categorical features.

Parameters:
   - labels: A NumPy ndarray containing the labels to be encoded. Each label corresponds to a category or class.
Returns:
   - label_encoded: A matrix where each row corresponds to a one-hot encoded representation of the input labels.
Raises:
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
We have our own one-hot encoding function already, so remove this.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479390618 :
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
@.*** +def do_the_cnn(

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391037 :

+def do_the_cnn(
input_shape_for_cnn: Union[tuple[int, int, int], tuple[int, int]],

convolution_kernel_size: tuple[int, int],

conv_list: list[int] = [8, 16],

pool_size: int = 2,

neuron_list: list[int] = [16],

dropout_rate: Union[None, float] = None,

last_activation: Literal["softmax", "sigmoid"] = "softmax",

regularization: Union[tf.keras.regularizers.L1, tf.keras.regularizers.L2, tf.keras.regularizers.L1L2, None] = None,

data_augmentation: bool = False,

optimizer: str = "Adam",

loss=Union[tf.keras.losses.BinaryCrossentropy, tf.keras.losses.CategoricalCrossentropy],

output_units=2, +) -> tf.keras.Model:

"""
Do an instance of CNN.
Indentation ⬇️ Suggested change
Do an instance of CNN.
Do an instance of CNN.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391902 :

Parameters:
  input_shape_for_cnn: shape of the input.
  convolution_kernel_size: size of the kernel (usually is the last number of the input tuple)
  pool_size: size of the pooling layer
  conv_list: list of unit for the conv layers.
  regularization: Type of regularization to insert into the CNN or MLP.
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Descriptions should start with a capital letter and end with period.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479392742 :
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Return:
   return the compiled model.
Raises:
   InvalidArgumentTypeException: one of the arguments is invalid.
   CNNException: raised when the input is not valid
"""

if regression and binary we can not uses more than 1
No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393226 :

check that the input is not null
if input_shape_for_cnn is None:
   raise CNNException
if len(neuron_list) <= 0 or dropout_rate <= 0:
   raise InvalidArgumentTypeException
generate the input

input_layer = tf.keras.Input(shape=input_shape_for_cnn)

if data_augmentation is not False:
   input_layer = tf.keras.layers.RandomRotation((-0.2, 0.5))(input_layer)
we do dynamically the conv2d

for conv_layer_number, neuron in enumerate(conv_list):
   print(conv_layer_number)
Remove prints

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393411 :
           kernel_size=convolution_kernel_size,
       )(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
   body = tf.keras.layers.BatchNormalization()(body)
   body = tf.keras.layers.MaxPool2D(pool_size=pool_size)(body)
for layer_number, neuron in enumerate(neuron_list):
   body = tf.keras.layers.Dense(neuron, kernel_regularizer=regularization, activation="relu")(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
we flatten
⬇️ Suggested change

we flatten

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393624 :
we flatten

body = tf.keras.layers.Flatten()(body)

create the model

classifier = tf.keras.layers.Dense(
   output_units, activation=last_activation, kernel_regularizer=regularization, name="classifier"
)(body)

create the model

model = tf.keras.Model(inputs=input_layer, outputs=classifier, name="the_mlp_model")

model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])

return model
+# now let's prepare two mega function one for classification and one for regression

⬇️ Suggested change

-# now let's prepare two mega function one for classification and one for regression

On eis_toolkit/prediction/model_performance_estimation.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479395062 :

Remove this, we have cross-validation implementation already

On tests/prediction/cnn_classification_and_probability_test.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479396489 :

Testing needs to be more extensive. Look at mlp_test.py for examples.

In eis_toolkit/exceptions.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479397400 :

+class CNNException(Exception):

"""Exception throws when something is invalid in the cnn."""

+class CNNRunningParameterException(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidCrossValidationSelected(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidNumberOfSplit(Exception):

"""Exception throws when running parameters are wrong."""

I don't think we need any of these exception types to be added. The ones already should suffice. Please take a look at what other prediction functions are currently using in EIS Toolkit.

— Reply to this email directly, view it on GitHub https://github.com/GispoCoding/eis_toolkit/pull/251#pullrequestreview-1864541715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3KUFN3J77XQHIADC5G55LYSHUMTAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNRUGU2DCNZRGU . You are receiving this because you authored the thread.Message ID: @.***>

Feb 07 '24 15:02 zelioluca

nothing nothing I fixed it! Tomorrow I do the docs and tests are we are good to go!

On Wed, 7 Feb 2024 at 17:37, luca zelioli @.***> wrote:

Niko there is something strange with the branch! Of course I did pull before start to work, but now I got rejected because seems that my branch is behind

On Tue, 6 Feb 2024 at 10:43, Niko Aarnio @.***> wrote:
@.**** requested changes on this pull request.

Hey,

I reviewed this if you (Luca/Javad) are still up to continue working on it.

So the biggest issue still is that EIS Toolkit style is not followed as rigorously as needed. Please be careful to use clear and informative names, avoiding typos, capitalizing when needed etc.

Regarding the "additional" stuff found in this PR: You should remove OHE and the cross-validation functions at least since they are defined already in EIS Toolkit. We do not have our own wrapper for normalizing, so that one can be done if you want to – just put it elsewhere and adhere to style with that too.

The first thing you should do if you continue to work on this is to merge master branch. After that you can utilize the OHE and CV tools, and take a look at current contents of prediction directory. By studying how we decided to structure and implement MLP and other modeling modules, you will get a better idea what we require for this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387526 :

+def normalize_the_data(scaler_agent: sklearn.preprocessing, data: np.ndarray):
"""

Normalize multi-dimensional data using a specified scaler.

This function applies a scaling technique to each feature in the data, transforming the data to a specified range or distribution. It is particularly useful for normalizing image data or similarly structured multi-dimensional data.

Parameters:
   - scaler_agent: An instance of a preprocessing scaler from sklearn.preprocessing (e.g., MinMaxScaler, StandardScaler) that will be used to apply normalization.
   - data: A NumPy ndarray containing the data to be normalized. The data is expected to be in a multi-dimensional format, typically including dimensions for sample size, height, width, and channels (e.g., for image data).
Returns:
   - normalized_data: The input data normalized according to the scaler_agent. The output data maintains the original shape of the input data.
"""

number_of_sample, h, w, c = data.shape

temp = scaler_agent.transform(data.reshape(-1, data.shape[-1]))

normalized_data = temp.reshape(number_of_sample, h, w, c)

return normalized_data
If you want to implement a normalizing function, please put it into its own file and under transformations module. Also, we don't want users to have to deal with any sklearn classes if possible, so don't put any such stuff into parameters. Let's keep it as functional style and modular as possible.

Please also add return type annotation, change the one letter variable names to something more descriptive and convert the docstring to match EIS Toolkit conventions/style. I would also consider renaming this function to "normalize".

I should point out still that it's not mandatory to implement this function as part of this PR.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479387914 :

+def make_one_hot_encoding(labels: np.ndarray):
"""
Convert categorical labels to a one-hot encoded format.

This function transforms a list of numerical or categorical labels into a one-hot encoded matrix, which is a binary matrix representing the categorization of each label. One-hot encoding is commonly used in machine learning to handle categorical features.

Parameters:
   - labels: A NumPy ndarray containing the labels to be encoded. Each label corresponds to a category or class.
Returns:
   - label_encoded: A matrix where each row corresponds to a one-hot encoded representation of the input labels.
Raises:
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
We have our own one-hot encoding function already, so remove this.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479390618 :
   - InvalidArgumentTypeException: Raised if the input 'labels' is None, indicating that no labels were provided for encoding.
"""

if labels is None:
   raise InvalidArgumentTypeException
to categorical

enc = OneHotEncoder(handle_unknown="ignore")

train and valid set

temp = np.reshape(labels, (-1, 1))

label_encoded = enc.fit_transform(temp).toarray()

return label_encoded
@.*** +def do_the_cnn(

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391037 :

+def do_the_cnn(
input_shape_for_cnn: Union[tuple[int, int, int], tuple[int, int]],

convolution_kernel_size: tuple[int, int],

conv_list: list[int] = [8, 16],

pool_size: int = 2,

neuron_list: list[int] = [16],

dropout_rate: Union[None, float] = None,

last_activation: Literal["softmax", "sigmoid"] = "softmax",

regularization: Union[tf.keras.regularizers.L1, tf.keras.regularizers.L2, tf.keras.regularizers.L1L2, None] = None,

data_augmentation: bool = False,

optimizer: str = "Adam",

loss=Union[tf.keras.losses.BinaryCrossentropy, tf.keras.losses.CategoricalCrossentropy],

output_units=2, +) -> tf.keras.Model:

"""
Do an instance of CNN.
Indentation ⬇️ Suggested change
Do an instance of CNN.
Do an instance of CNN.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479391902 :

Parameters:
  input_shape_for_cnn: shape of the input.
  convolution_kernel_size: size of the kernel (usually is the last number of the input tuple)
  pool_size: size of the pooling layer
  conv_list: list of unit for the conv layers.
  regularization: Type of regularization to insert into the CNN or MLP.
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Descriptions should start with a capital letter and end with period.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479392742 :
  data_augmentation: if you want data augmentation or not (Random rotation is implemented).
  optimizer: select one optimizer for the MLP.
  loss: the loss function used to calculate accuracy.
  neuron_list: List of unit or neuron used to build the network.
  dropout_rate: if you want to use dropout add a number as floating point.
  output_units: number of output classes.
  last_activation: usually you should use softmax or sigmoid.
Return:
   return the compiled model.
Raises:
   InvalidArgumentTypeException: one of the arguments is invalid.
   CNNException: raised when the input is not valid
"""

if regression and binary we can not uses more than 1
No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393226 :

check that the input is not null
if input_shape_for_cnn is None:
   raise CNNException
if len(neuron_list) <= 0 or dropout_rate <= 0:
   raise InvalidArgumentTypeException
generate the input

input_layer = tf.keras.Input(shape=input_shape_for_cnn)

if data_augmentation is not False:
   input_layer = tf.keras.layers.RandomRotation((-0.2, 0.5))(input_layer)
we do dynamically the conv2d

for conv_layer_number, neuron in enumerate(conv_list):
   print(conv_layer_number)
Remove prints

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393411 :
           kernel_size=convolution_kernel_size,
       )(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
   body = tf.keras.layers.BatchNormalization()(body)
   body = tf.keras.layers.MaxPool2D(pool_size=pool_size)(body)
for layer_number, neuron in enumerate(neuron_list):
   body = tf.keras.layers.Dense(neuron, kernel_regularizer=regularization, activation="relu")(body)
   if dropout_rate is not None:
       body = tf.keras.layers.Dropout(dropout_rate)(body)
we flatten
⬇️ Suggested change

we flatten

In eis_toolkit/prediction/cnn_classification_and_probability.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479393624 :
we flatten

body = tf.keras.layers.Flatten()(body)

create the model

classifier = tf.keras.layers.Dense(
   output_units, activation=last_activation, kernel_regularizer=regularization, name="classifier"
)(body)

create the model

model = tf.keras.Model(inputs=input_layer, outputs=classifier, name="the_mlp_model")

model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])

return model
+# now let's prepare two mega function one for classification and one for regression

⬇️ Suggested change

-# now let's prepare two mega function one for classification and one for regression

On eis_toolkit/prediction/model_performance_estimation.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479395062 :

Remove this, we have cross-validation implementation already

On tests/prediction/cnn_classification_and_probability_test.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479396489 :

Testing needs to be more extensive. Look at mlp_test.py for examples.

In eis_toolkit/exceptions.py https://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1479397400 :

+class CNNException(Exception):

"""Exception throws when something is invalid in the cnn."""

+class CNNRunningParameterException(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidCrossValidationSelected(Exception):

"""Exception throws when running parameters are wrong."""

+class InvalidNumberOfSplit(Exception):

"""Exception throws when running parameters are wrong."""

I don't think we need any of these exception types to be added. The ones already should suffice. Please take a look at what other prediction functions are currently using in EIS Toolkit.

— Reply to this email directly, view it on GitHub https://github.com/GispoCoding/eis_toolkit/pull/251#pullrequestreview-1864541715, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3KUFN3J77XQHIADC5G55LYSHUMTAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNRUGU2DCNZRGU . You are receiving this because you authored the thread.Message ID: @.***>

Feb 07 '24 15:02 zelioluca

Hi Niko, I did couple of test! So I did all you requested, but I did not found the OHE function, so I put one small into the tests. I m ready for the review! There is some example how to write docs? I do also that one and we are ok :-)

Feb 08 '24 09:02 zelioluca

I Finished the review now I do testing according MLP

Feb 12 '24 12:02 zelioluca

Okay, so is it ready for another review or are you still working on something?

Feb 12 '24 13:02 nmaarnio

Hello man! I m doing some tests and after i m ready! A thing i m getting wired beartype stuff from numpy type i have to fix also that!

On Mon, 12 Feb 2024, 15:08 Niko Aarnio, @.***> wrote:

Okay, so is it ready for another review or are you still working on something?

— Reply to this email directly, view it on GitHub https://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1938649635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3KUFKNEUD5GXJFPBNTLS3YTIH47AVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZYGY2DSNRTGU . You are receiving this because you authored the thread.Message ID: @.***>

Feb 12 '24 13:02 zelioluca

If you have time you can check the code of course 😁

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

From: Niko Aarnio @.> Sent: Monday, February 12, 2024 3:08:31 PM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

Okay, so is it ready for another review or are you still working on something?

— Reply to this email directly, view it on GitHubhttps://github.com/GispoCoding/eis_toolkit/pull/251#issuecomment-1938649635, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3KUFKNEUD5GXJFPBNTLS3YTIH47AVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZYGY2DSNRTGU. You are receiving this because you authored the thread.Message ID: @.***>

Feb 12 '24 13:02 zelioluca

noniiiiiiiiiiiiiiiiiiiiiiii I am ready man !

Feb 12 '24 13:02 zelioluca

I have a mega idea

Feb 13 '24 10:02 zelioluca

I mean we can adapt the first cnn for both regression and classification. Only few params has to be changed. I already implemented it! Now I push if you are ok I can removed the not used anymore cnn

Feb 13 '24 10:02 zelioluca

Nice thanks 5 min and i will be there

Sent from Outlook for Androidhttps://aka.ms/AAb9ysg

From: Niko Aarnio @.> Sent: Tuesday, February 13, 2024 1:24:06 PM To: GispoCoding/eis_toolkit @.> Cc: Luca Zelioli @.>; Author @.> Subject: Re: [GispoCoding/eis_toolkit] Add cnn (PR #251)

@nmaarnio commented on this pull request.

In tests/prediction/cnn_classification_and_probability_test.pyhttps://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1487670359:

make cv

selected_cv = performance_model_estimation(cross_validation_type="SKFOLD", number_of_split=5)

I mean a function called _get_cross_validator from module machine_learning_general.py. This is the one that has been merged and used so far in other machine learning training tools.

— Reply to this email directly, view it on GitHubhttps://github.com/GispoCoding/eis_toolkit/pull/251#discussion_r1487670359, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AF3KUFPOJIRQTEBKPEPIHO3YTNENNAVCNFSM6AAAAABAI6FLTCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNZXGY4TENRSGQ. You are receiving this because you authored the thread.Message ID: @.***>

Feb 13 '24 11:02 zelioluca

I saw here in master eis_toolkit/eis_toolkit/[prediction(https://github.com/GispoCoding/eis_toolkit/tree/master/eis_toolkit/prediction) /machine_learning_general.py but not in my files I mean the files in my branch!

Feb 13 '24 11:02 zelioluca

anyway I take the same function you suggested :-) and it works!

Feb 13 '24 11:02 zelioluca

eis_toolkit eis_toolkit copied to clipboard

Add cnn

Sent from Outlook for Android https://aka.ms/AAb9ysg

I should point out still that it's not mandatory to implement this function as part of this PR.

to categorical

train and valid set

We have our own one-hot encoding function already, so remove this.

to categorical

train and valid set

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

Descriptions should start with a capital letter and end with period.

if regression and binary we can not uses more than 1

No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

check that the input is not null

generate the input

we do dynamically the conv2d

Remove prints

we flatten

we flatten

we flatten

create the model

create the model

Remove this, we have cross-validation implementation already

Testing needs to be more extensive. Look at mlp_test.py for examples.

I should point out still that it's not mandatory to implement this function as part of this PR.

to categorical

train and valid set

We have our own one-hot encoding function already, so remove this.

to categorical

train and valid set

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

Descriptions should start with a capital letter and end with period.

if regression and binary we can not uses more than 1

No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

check that the input is not null

generate the input

we do dynamically the conv2d

Remove prints

we flatten

we flatten

we flatten

create the model

create the model

Remove this, we have cross-validation implementation already

Testing needs to be more extensive. Look at mlp_test.py for examples.

I should point out still that it's not mandatory to implement this function as part of this PR.

to categorical

train and valid set

We have our own one-hot encoding function already, so remove this.

to categorical

train and valid set

This is not a good name for a function. And mark it as private (by prefexing it with "_") as users are not meant to import it.

Descriptions should start with a capital letter and end with period.

if regression and binary we can not uses more than 1

No need for such a comment; rather, type out a message inside the exception. Look at other EIS Toolkit implementations for examples.

check that the input is not null

generate the input

we do dynamically the conv2d

Remove prints

we flatten

we flatten

we flatten

create the model

create the model

Remove this, we have cross-validation implementation already

Testing needs to be more extensive. Look at mlp_test.py for examples.

make cv

eis_toolkit
eis_toolkit copied to clipboard