heartkit icon indicating copy to clipboard operation
heartkit copied to clipboard

How to train a ECG Arrhythmia Classifier to support more classes

Open mhfan opened this issue 1 year ago • 1 comments

It always fails to perform training successfully when I set these params to more than 4 classes, e.g.:

"num_classes": 15,
"class_map": {
   "0": 0, "1": 1, "2": 2, "3": 3, "4": 4, "5": 5, "6": 6,
   "7": 7, "8": 8, "9": 9, "10": 10, "11": 11, "12": 12, "13": 13, "127": 14
},
"class_names": [
   "NSR", "SBrad", "STach", "SArrh", "SVArr", "SVT", "VTach",
   "AFib", "AFlut", "VFib", "VFlut", "BigU", "TrigU", "Pace", "Noise"
],

mhfan avatar Oct 10 '24 09:10 mhfan

Thanks for raising this issue. Do you have the error output?

I think the problem you are running into is that some datasets don't have sufficient samples for each class. For example, LSAD dataset only contains about 10 of the available classes. Furthermore, there are a couple classes where it has only a single example. When training, heartkit splits the patients into training and test using sklearn.model_selection.train_test_split with stratify set. This requires there to be at least two samples of each class.

I will work on filtering out classes with less than a min threshold (e.g. 2). I'll also print the class distribution and display a warning when insufficient class samples are present. The class distribution is also helpful when you want to weight classes. The following snippet should work for 7 classes if you use LSAD dataset.

    "num_classes": 7,
    "class_map": {
        "0": 0, "1": 1, "2": 2, "3": 3, "5": 4, "7": 5, "8": 6
    },
    "class_names": [
        "NSR", "SBrad", "STach", "SArrh", "VTach", "AFib", "AFlut"
    ],

    "samples_per_patient": [25, 5, 25, 39, 80, 45, 10],
    "val_samples_per_patient": [25, 5, 25, 39, 80, 45, 10],
    "test_samples_per_patient": [25, 5, 25, 39, 80, 45, 10],

apage224 avatar Oct 10 '24 15:10 apage224

Hi There! You're having an issue with increasing the number of classes in your ECG Arrhythmia classifier. Here are a few things you could check to help resolve this:

  • Model Architecture: Ensure your model’s output layer has the correct number of neurons (matching num_classes=15). For example, if you’re using a dense layer for classification, it should look something like this: model.add(Dense(15, activation='softmax'))

  • Loss Function: If you're using a classification loss like categorical_crossentropy, ensure your labels are one-hot encoded. If you're using sparse_categorical_crossentropy, your labels should be integers.

  • Data Imbalance: More classes can increase data imbalance. Double-check that each class has enough data for training. If some classes are underrepresented, consider using techniques like oversampling or class weights.

  • Batch Size/Memory: Increasing the number of classes may require more memory, especially if you're training on large datasets. You might want to try reducing the batch size to avoid memory issues.

If you've already checked these and the issue persists, share more details, and I’d be happy to help further!

soheilkooklan avatar Oct 12 '24 21:10 soheilkooklan