predict.py node / probability accumulation : Usage clarifications

Open juliencarponcy opened this issue 1 year ago • 1 comments

Hello,

As previously mentioned in the timeflux main repo issues, I'm trying to use the nodes exemplified in the c-VEP speller. Beyond the classifier, I'm stumbling a little bit in correctly understanding how to use the different arguments properly and perhaps to correctly understand the functioning of the node (speller/CVEP/speller/nodes/predict.py).

Particular questions :

Digging into this node inner working, I believe I must change the "trigger" of the "epoch" node in my pipelines, so as to have one by cycle instead of one by trial (of several cycles) as I had until now.
One confusing aspect for me is also the "buffer-related arguments" of 3 nodes : epoch (param: buffer), classification (param: buffer_size), and predict (params, min_buffer_size and max_buffer_size). I understand that the 2 first ones are related to buffer normally unnecessary signal in case of delay of transmission of events, whereas the predict buffers are relating to number of repetitions minimal/sufficient to broadcast a probability/classification. However, the practical consequences of these 2 arguments are a not yet straightforward to me.
Finally, the necessity and ways to implement the "reset events" (in the i_reset port) are not clear to me neither. I believe I'm supposed to emit an event on this port at the end of a trial, but the format that these reset events must take is not crystal clear. I can see that it is possibly a dataframe with a "label" column "reset_{source}_accumulation". One ambiguity for me, it could be a reset of the accumulated 'calibration' data, or the reset of all the cycles after each trail is over during the test phase (predict_proba).

I hope that the uncertainties I raised are understandable in how I explained them, and am looking forward for clarifications of these various interesting nodeq which are core to the superiority offered by timeflux for live time-series ML pipelines.

Best,

Julien

Jul 11 '24 10:07 juliencarponcy

Hi,

Apologies for answering only now.

The predict node accumulates probabilities from the classification engine and emits a final prediction when enough confidence is reached.

I haven't touched the code in a while, but I believe events are sent for each cycle in the current interface. But I'm not sure about what you're trying to achieve.
You're right in your interpretation of the buffers for the first two nodes. It allows to capture data even if the event triggers are delayed. Regarding min_buffer_size and max_buffer_size in predict, the behavior is a bit different. This is a circular buffer that allows to fine tune the decision model. min_buffer_size is the minimum of predictions to accumulate from the classification engine before even attempting to emit a decision. For example, you may want to wait a little bit before taking a decision in case the subject gaze has been captured by an adjacent stim. max_buffer_size, on the other hand, allows to get rid of past predictions, in case the subject has been distracted in previous trials, otherwise it would take too much time to emit a decision, knowing that early predictions might not be relevant. There is a balance to be found, depending on your EEG headset, the subject, environmental conditions, etc.
You can play with and adjust these parameters from the web interface. The s key will bring a configuration menu that will send a reset event to the prediction engine. It is not strictly required for the classification and prediction engines to work, it just allows you to change parameters on the fly. See: https://github.com/timeflux/demos/blob/225837acd7aac6319bca2f2c60e7a4bf1817381b/speller/CVEP/speller/gui/assets/js/app.js#L58-L76

Jul 16 '24 17:07 mesca