demos icon indicating copy to clipboard operation
demos copied to clipboard

predict.py node / probability accumulation : Usage clarifications

Open juliencarponcy opened this issue 1 year ago • 1 comments

Hello,

As previously mentioned in the timeflux main repo issues, I'm trying to use the nodes exemplified in the c-VEP speller. Beyond the classifier, I'm stumbling a little bit in correctly understanding how to use the different arguments properly and perhaps to correctly understand the functioning of the node (speller/CVEP/speller/nodes/predict.py).

Particular questions :

  • Digging into this node inner working, I believe I must change the "trigger" of the "epoch" node in my pipelines, so as to have one by cycle instead of one by trial (of several cycles) as I had until now.

  • One confusing aspect for me is also the "buffer-related arguments" of 3 nodes : epoch (param: buffer), classification (param: buffer_size), and predict (params, min_buffer_size and max_buffer_size). I understand that the 2 first ones are related to buffer normally unnecessary signal in case of delay of transmission of events, whereas the predict buffers are relating to number of repetitions minimal/sufficient to broadcast a probability/classification. However, the practical consequences of these 2 arguments are a not yet straightforward to me.

  • Finally, the necessity and ways to implement the "reset events" (in the i_reset port) are not clear to me neither. I believe I'm supposed to emit an event on this port at the end of a trial, but the format that these reset events must take is not crystal clear. I can see that it is possibly a dataframe with a "label" column "reset_{source}_accumulation". One ambiguity for me, it could be a reset of the accumulated 'calibration' data, or the reset of all the cycles after each trail is over during the test phase (predict_proba).

I hope that the uncertainties I raised are understandable in how I explained them, and am looking forward for clarifications of these various interesting nodeq which are core to the superiority offered by timeflux for live time-series ML pipelines.

Best,

Julien

juliencarponcy avatar Jul 11 '24 10:07 juliencarponcy

Hi,

Apologies for answering only now.

The predict node accumulates probabilities from the classification engine and emits a final prediction when enough confidence is reached.

  • I haven't touched the code in a while, but I believe events are sent for each cycle in the current interface. But I'm not sure about what you're trying to achieve.

  • You're right in your interpretation of the buffers for the first two nodes. It allows to capture data even if the event triggers are delayed. Regarding min_buffer_size and max_buffer_size in predict, the behavior is a bit different. This is a circular buffer that allows to fine tune the decision model. min_buffer_size is the minimum of predictions to accumulate from the classification engine before even attempting to emit a decision. For example, you may want to wait a little bit before taking a decision in case the subject gaze has been captured by an adjacent stim. max_buffer_size, on the other hand, allows to get rid of past predictions, in case the subject has been distracted in previous trials, otherwise it would take too much time to emit a decision, knowing that early predictions might not be relevant. There is a balance to be found, depending on your EEG headset, the subject, environmental conditions, etc.

  • You can play with and adjust these parameters from the web interface. The s key will bring a configuration menu that will send a reset event to the prediction engine. It is not strictly required for the classification and prediction engines to work, it just allows you to change parameters on the fly. See: https://github.com/timeflux/demos/blob/225837acd7aac6319bca2f2c60e7a4bf1817381b/speller/CVEP/speller/gui/assets/js/app.js#L58-L76

mesca avatar Jul 16 '24 17:07 mesca