nupic.audio icon indicating copy to clipboard operation
nupic.audio copied to clipboard

CinC/Physionet PCG/ECG challenge 2016

Open breznak opened this issue 9 years ago • 0 comments

CinC challenge

https://physionet.org/challenge/2016/

A prestigious challenge/conference with nice data!

:fire: UPDATE: game's still ON! :guitar:

Looking for hackers to help me set someting up, if it's feasible. The there will be whole summer to tune the app.

Blocked by: Add encoders #22

Plan of attack

  • [ ] audio
    • [x] for now use wav2vect from Matlab
    • [ ] implement wavEncoder - IN PROGRESS #26
    • [ ] evaluate if functionality of the WAVEncoder (internal scipy) is the same as matlab's
    • [ ] try Cochlea encoder
    • [ ] implement sound encoders for nupic.audio #22
  • [ ] training
    • records are Normal/Anomaly/Unknown
    • [ ] aggregate all NORMAL records to a 2 column file (reset, PCG)
      • [x] how radical subsampling? bcs nupic is too slow to process whole dataset: only down to 1000(from 2000),bcs of Sampling Theorem (Fs>=2*F)
    • [ ] commit the training data files (bcs the preprocessing takes long)
    • [ ] train a HTM model + serialize it
    • [ ] try param swarming
  • [ ] evaluation
    • [ ] load the model, disable learning
    • [ ] 2 tasks description.py?, OR other way to train/load/eval a model on datasets
    • [ ] compute average anomaly score for all datapoints of a record
    • [ ] implement the anomaly metric in nupic
    • [ ] create a model (for nupic?) that does this classification based on avg. anomaly scores?
    • [ ] threshold to Normal/Anomaly/Unknown
  • [ ] submission
    • [ ] modify examples sample2016*
    • [ ] nupic is installed, so setup will just source a virtualenv
    • [ ] each evaluation in next will call matlab (wav2csv), python(writes anomaly scores to CSV), matlab again(loads anomalies and decides classification)
    • this is problematic, better go full-python if possible!
  • [ ] improvements:
    • [ ] try bag (multi model) voting
      • [ ] model trained on full normal data
    • [ ] model on FHS parts
    • [ ] model on anomalous data
    • [ ] model pretrained on ECG data from other sources! https://github.com/breznak/nupic.biodat

Working plan to get some validation results ASAP:

  • [ ] training data
    • will train only on Normal data and select (FHS) subsequences of it
    • data extracted from Matlab @breznak will do that
  • [ ] train HTM model
    • on the provided data
    • just one HTM model (with RDSE? encoder, what best settings? probably no time to swarm)
    • able to serialize the model and load to run on eval. data (learning off)
      • the approach with OPF is not reliably working, can someone post code to do that? (@rhyolight or someone..?)
  • [ ] write simple classification function: classify(anScores[])
    • should decide classification from the anomaly scores for the whole sequence/sample
    • can be sth like avg and Normal iff <0.4; UNKNOWN iff [0.4...0.7]; Anomal iff > 0.7; ETA ~10mins
  • [ ] score
    • process validation data (@breznak will commit a file)
    • classify & compute score -> submit! :pray:

breznak avatar May 12 '16 17:05 breznak