odas icon indicating copy to clipboard operation
odas copied to clipboard

Cannot obtain SSL result on simulated audio signal

Open baynaa7 opened this issue 5 years ago • 2 comments

Hello, I'm testing odas sound source localization (ssl) with simulated audio signal. First, I saved ssl result in json file (for example potential_ssl.json). Using python script, I read the json file and converted x,y,z and e values to elevation and azimuth degs using code written in odas_web/graph.js. When I plot the angels, I wasn't able to obtain ssl result on azimuth (expected azimuth deg=90 and -135). Here I uploaded respeaker with polar coordinate. micArra

I think there must be something wrong with ssl result. Would you please help me how to obtain azimuth angles accurately? Here is my configuration file: `# Configuration file for ReSpeaker circular sound card

version = "2.1";

Raw

raw: {

fS = 16000;
hopSize = 512;
nBits = 32;
nChannels = 7; 

# Input with raw signal from file
interface: {
    type = "file";
    path = "input.raw";
};

}

Mapping

mapping: {

map: (1, 2, 3, 4, 5, 6, 7);

}

General

general: {

epsilon = 1E-20;

size: 
{
    hopSize = 512;
    frameSize = 1024;
};

samplerate:
{
    mu = 16000;
    sigma2 = 0.01;
};

speedofsound:
{
    mu = 343.0;
    sigma2 = 25.0;
};

mics = (
    
    # Microphone 1
    { 
        mu = ( +0.0000, +0.0000, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );
    },

    # Microphone 2
    { 
        mu = ( -0.0160, +0.0277, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );
    },

    # Microphone 3
    { 
        mu = ( -0.0320, +0.0000, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );
    },

    # Microphone 4
    { 
        mu = ( -0.0160, -0.0277, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );
    },

    # Microphone 5
    { 
        mu = ( +0.0160, -0.0277, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );        
    },

    # Microphone 6
    { 
        mu = ( +0.0320, +0.0000, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );        
    },

    # Microphone 7
    { 
        mu = ( +0.0160, +0.0277, +0.0000 ); 
        sigma2 = ( +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000, +0.000 );
        direction = ( +0.000, +0.000, +1.000 );
        angle = ( 80.0, 90.0 );
    }
    
);

# Spatial filters to include only a range of direction if required
# (may be useful to remove false detections from the floor, or
# limit the space search to a restricted region)
spatialfilters = (

    {

        direction = ( +0.000, +0.000, +1.000 );
        angle = (80.0, 90.0);

    }

);  

nThetas = 181;
gainMin = 0.25;

};

Stationnary noise estimation

sne: {

b = 3;
alphaS = 0.1;
L = 150;
delta = 3.0;
alphaD = 0.1;

}

Sound Source Localization

ssl: {

nPots = 2;
nMatches = 10;
probMin = 0.5;
nRefinedLevels = 1;
interpRate = 4;

# Number of scans: level is the resolution of the sphere
# and delta is the size of the maximum sliding window
# (delta = -1 means the size is automatically computed)
scans = (
    { level = 2; delta = -1; },
    { level = 4; delta = -1; }
);

# Output to export potential sources
potential: {
  
    format = "json";
    interface: {
        type = "file";
        path = "potential_ssl.json";
    };
};

};

Sound Source Tracking

sst: {

# Mode is either "kalman" or "particle"

mode = "particle";

# Add is either "static" or "dynamic"

add = "dynamic";

# Parameters used by both the Kalman and particle filter

active = (
    { weight = 1.0; mu = 0.3; sigma2 = 0.0025 }
);

inactive = (
    { weight = 1.0; mu = 0.15; sigma2 = 0.0025 }
);

sigmaR2_prob = 0.0025;
sigmaR2_active = 0.0225;
sigmaR2_target = 0.0025;
Pfalse = 0.1;
Pnew = 0.1;
Ptrack = 0.8;

theta_new = 0.9;
N_prob = 5;
theta_prob = 0.8;
N_inactive = ( 150, 200, 250, 250 );
theta_inactive = 0.9;

# Parameters used by the Kalman filter only

kalman: {

    sigmaQ = 0.001;
    
};

# Parameters used by the particle filter only

particle: {

    nParticles = 1000;
    st_alpha = 2.0;
    st_beta = 0.04;
    st_ratio = 0.5;
    ve_alpha = 0.05;
    ve_beta = 0.2;
    ve_ratio = 0.3;
    ac_alpha = 0.5;
    ac_beta = 0.2;
    ac_ratio = 0.2;
    Nmin = 0.7;

};

target: ();

# Output to export tracked sources

tracked: {

    format = "json";

    interface: {
        type = "file";
        path = "tracks.txt";
    };

};

    

}

sss: {

# Mode is either "dds", "dgss" or "dmvdr"

mode_sep = "dds";
mode_pf = "ms";

gain_sep = 1.0;
gain_pf = 10.0;

dds: {

};

dgss: {

    mu = 0.01;
    lambda = 0.5;

};

dmvdr: {

};

ms: {

    alphaPmin = 0.07;
    eta = 0.5;
    alphaZ = 0.8;        
    thetaWin = 0.3;
    alphaWin = 0.3;
    maxAbsenceProb = 0.9;
    Gmin = 0.01;
    winSizeLocal = 3;
    winSizeGlobal = 23;
    winSizeFrame = 256;

};

ss: {

    Gmin = 0.01;
    Gmid = 0.9;
    Gslope = 10.0;

}

 separated: {

    fS = 16000;
    hopSize = 128;
    nBits = 32;        

    interface: {
        type = "file";
        path = "separated.raw";
    };        

};

postfiltered: {

    fS = 16000;
    hopSize = 128;
    nBits = 32;        
    gain = 10.0;

    interface: {
        type = "file";
        path = "postfiltered.raw";
    };        

};

}

classify: {

frameSize = 1024;
winSize = 3;
tauMin = 32;
tauMax = 200;
deltaTauMax = 7;
alpha = 0.3;
gamma = 0.05;
phiMin = 0.15;
r0 = 0.2;    

category: {

    format = "undefined";

    interface: {
        type = "blackhole";
    }

}

} ` I uploaded input.raw and input.wav file (simulated in matlab with repeaker microphone setting and recorded 2 persons who are speaking from 90 and -135 azimuth degree, 7 channels were saved in wav file and the wav file was converted into raw file using sox)

inputSignal.zip

Also, I attached corresponding azimuth angle result with input waveform. ssl_true

(ssl ground truth for 2 sources) ssl_real (ssl result from odas)

Thank you in advance. Baynaa.

baynaa7 avatar Oct 16 '19 15:10 baynaa7

Hi. have you solved the issue?

hltkts123 avatar Mar 04 '21 08:03 hltkts123

No ...

baynaa7 avatar Apr 22 '21 09:04 baynaa7