text_based_depression icon indicating copy to clipboard operation
text_based_depression copied to clipboard

Features and model for audio only

Open haijing1995 opened this issue 2 years ago • 6 comments

hello, the audio only results in docs seems great, could you tell me what features do you use and model construction?

haijing1995 avatar Nov 30 '22 04:11 haijing1995

Hey, I'm sorry which results are you referring to?

RicherMans avatar Nov 30 '22 06:11 RicherMans

Hey, I'm sorry which results are you referring to?

the audio only results on lstm and tcn in /docs/report.md

haijing1995 avatar Nov 30 '22 10:11 haijing1995

Hey, thanks for noting these results, they were a part of the paper during the development process.

The "HighOrder" features are just the standard mean, median, second order, third order, max, min features extracted from a mel spectrogram.

Btw I don't think these results are all too "good", there were some notable better results we obtained with self-supervised learning such as this paper

RicherMans avatar Nov 30 '22 10:11 RicherMans

Hey, thanks for noting these results, they were a part of the paper during the development process.

The "HighOrder" features are just the standard mean, median, second order, third order, max, min features extracted from a mel spectrogram.

Btw I don't think these results are all too "good", there were some notable better results we obtained with self-supervised learning such as this paper

Thanks for your reply, I have a few more questions,

  1. Each answer of each participant has a different length of time, so the extracted feature(eg. mel-spectrogram) length is also different.
  2. Different participants had different numbers of responses. In order to be able to train in batches, how do you unify these two dimensions(not learning x-feature in the paper you mentioned)?

haijing1995 avatar Nov 30 '22 11:11 haijing1995

Each answer of each participant has a different length of time, so the extracted feature(eg. mel-spectrogram) length is also different.

We used a batchsize of 1 for training, which did not add any padding.

Different participants had different numbers of responses. In order to be able to train in batches, how do you unify these two dimensions(not learning x-feature in the paper you mentioned)?

We really did train with a batch size of 1 for most papers since the difference as you mention between samples is substantial. However as a note from us, the dataset is very small for common scientific standards, which leads to a very large variance between most experiments, so do not expect to run our experiments a single time and obtain the same result. The random seed on this dataset has a far larger impact than most "optimization" methods.

RicherMans avatar Nov 30 '22 14:11 RicherMans

Thanks a lot for your help, I will try.

haijing1995 avatar Dec 01 '22 01:12 haijing1995