siamese-fc icon indicating copy to clipboard operation
siamese-fc copied to clipboard

evaluate on VOT benchmark

Open gongbudaizhe opened this issue 7 years ago • 4 comments

Hi, luca

Recently, I am trying to reproduce your results in the VOT2016 benchmark. By using provided result files, the computed EAO is 0.2905.

However, evaluating the pretrained color model with 3 search scales, using VOT benchmark toolkits myself, I only get EAO 0.2247.

Below is the evaluation script. Is there anything I am missing?

function sfc_vot
    % *************************************************************
    % VOT: Always call exit command at the end to terminate Matlab!
    % *************************************************************
    cleanup = onCleanup(@() exit() );

    % *************************************************************
    % VOT: Set random seed to a different value every time.
    % *************************************************************
    RandStream.setGlobalStream(RandStream('mt19937ar', 'Seed', sum(clock)));

    % *************************************************************
    % SFC: Set tracking parameters
    % *************************************************************
    p.numScale = 3;
    p.scaleStep = 1.0375;
    p.scalePenalty = 0.9745;
    p.scaleLR = 0.59; % damping factor for scale update
    p.responseUp = 16; % upsampling the small 17x17 response helps with the accuracy
    p.windowing = 'cosine'; % to penalize large displacements
    p.wInfluence = 0.176; % windowing influence (in convex sum)
    p.net = '2016-08-17.net.mat';
    %% execution, visualization, benchmark
    p.gpus = 1;
    p.fout = -1;
    %% Params from the network architecture, have to be consistent with the training
    p.exemplarSize = 127;  % input z size
    p.instanceSize = 255;  % input x size (search region)
    p.scoreSize = 17;
    p.totalStride = 8;
    p.contextAmount = 0.5; % context amount for the exemplar
    p.subMean = false;
    %% SiamFC prefix and ids
    p.prefix_z = 'a_'; % used to identify the layers of the exemplar
    p.prefix_x = 'b_'; % used to identify the layers of the instance
    p.prefix_join = 'xcorr';
    p.prefix_adj = 'adjust';
    p.id_feat_z = 'a_feat';
    p.id_score = 'score';
% -------------------------------------------------------------------------------------------------

    startup;

    % Get environment-specific default paths.
    p = env_paths_tracking(p);
    % Load ImageNet Video statistics
    if exist(p.stats_path,'file')
        stats = load(p.stats_path);
    else
        warning('No stats found at %s', p.stats_path);
        stats = [];
    end
    % Load two copies of the pre-trained network
    net_z = load_pretrained([p.net_base_path p.net], p.gpus);
    net_x = load_pretrained([p.net_base_path p.net], []);

    % Divide the net in 2
    % exemplar branch (used only once per video) computes features for the target
    remove_layers_from_prefix(net_z, p.prefix_x);
    remove_layers_from_prefix(net_z, p.prefix_join);
    remove_layers_from_prefix(net_z, p.prefix_adj);
    % instance branch computes features for search region x and cross-correlates with z features
    remove_layers_from_prefix(net_x, p.prefix_z);
    zFeatId = net_z.getVarIndex(p.id_feat_z);
    scoreId = net_x.getVarIndex(p.id_score);
    
    % **********************************
    % VOT: Get initialization data
    % **********************************
    [handle, first_image, region] = vot('rectangle');

    % If the provided region is a polygon ...
    if numel(region) > 4
      x1 = round(min(region(1:2:end)));
      x2 = round(max(region(1:2:end)));
      y1 = round(min(region(2:2:end)));
      y2 = round(max(region(2:2:end)));
      region = round([x1, y1, x2 - x1, y2 - y1]);
    else
      region = round([round(region(1)), round(region(2)), ... 
                      round(region(1) + region(3)) - round(region(1)), ...
                      round(region(2) + region(4)) - round(region(2))]);
    end;

    irect = region
    targetPosition = [irect(2) + (1 + irect(4)) / 2 irect(1) + (1 + irect(3)) / 2];
    targetSize = [irect(4) irect(3)];

    startFrame = 1;
    % get the first frame of the video
    im = gpuArray(single(imread(first_image)));
    % if grayscale repeat one channel to match filters size
    if(size(im, 3)==1)
        im = repmat(im, [1 1 3]);
    end
    % get avg for padding
    avgChans = gather([mean(mean(im(:,:,1))) mean(mean(im(:,:,2))) mean(mean(im(:,:,3)))]);

    wc_z = targetSize(2) + p.contextAmount*sum(targetSize);
    hc_z = targetSize(1) + p.contextAmount*sum(targetSize);
    s_z = sqrt(wc_z*hc_z);
    scale_z = p.exemplarSize / s_z;
    % initialize the exemplar
    [z_crop, ~] = get_subwindow_tracking(im, targetPosition, [p.exemplarSize p.exemplarSize], [round(s_z) round(s_z)], avgChans);
    if p.subMean
        z_crop = bsxfun(@minus, z_crop, reshape(stats.z.rgbMean, [1 1 3]));
    end
    d_search = (p.instanceSize - p.exemplarSize)/2;
    pad = d_search/scale_z;
    s_x = s_z + 2*pad;
    % arbitrary scale saturation
    min_s_x = 0.2*s_x;
    max_s_x = 5*s_x;

    switch p.windowing
        case 'cosine'
            window = single(hann(p.scoreSize*p.responseUp) * hann(p.scoreSize*p.responseUp)');
        case 'uniform'
            window = single(ones(p.scoreSize*p.responseUp, p.scoreSize*p.responseUp));
    end
    % make the window sum 1
    window = window / sum(window(:));
    scales = (p.scaleStep .^ ((ceil(p.numScale/2)-p.numScale) : floor(p.numScale/2)));
    % evaluate the offline-trained network for exemplar z features
    net_z.eval({'exemplar', z_crop});
    z_features = net_z.vars(zFeatId).value;
    z_features = repmat(z_features, [1 1 1 p.numScale]);

    % start tracking
    i = startFrame;
    while true
        % **********************************
        % VOT: Get next frame
        % **********************************
        [handle, image] = handle.frame(handle);

        if isempty(image)
          break;
        end;

        if i>startFrame
            % load new frame on GPU
            im = gpuArray(single(imread(image)));
            % if grayscale repeat one channel to match filters size
            if(size(im, 3)==1)
              im = repmat(im, [1 1 3]);
            end
            scaledInstance = s_x .* scales;
            scaledTarget = [targetSize(1) .* scales; targetSize(2) .* scales];
            % extract scaled crops for search region x at previous target position
            x_crops = make_scale_pyramid(im, targetPosition, scaledInstance, p.instanceSize, avgChans, stats, p);
            % evaluate the offline-trained network for exemplar x features
            [newTargetPosition, newScale] = tracker_eval(net_x, round(s_x), scoreId, z_features, x_crops, targetPosition, window, p);
            targetPosition = gather(newTargetPosition);
            % scale damping and saturation
            s_x = max(min_s_x, min(max_s_x, (1-p.scaleLR)*s_x + p.scaleLR*scaledInstance(newScale)));
            targetSize = (1-p.scaleLR)*targetSize + p.scaleLR*[scaledTarget(1,newScale) scaledTarget(2,newScale)];
        else
            % at the first frame output position and size passed as input (ground truth)
        end
        i = i + 1;
        rectPosition = [targetPosition([2,1]) - targetSize([2,1])/2, targetSize([2,1])];
        % output bbox in the original frame coordinates
        oTargetPosition = targetPosition; % .* frameSize ./ newFrameSize;
        oTargetSize = targetSize; % .* frameSize ./ newFrameSize;
        region  = [oTargetPosition([2,1]) - oTargetSize([2,1])/2, oTargetSize([2,1])];

        % **********************************
        % VOT: Report position for frame
        % **********************************
        handle = handle.report(handle, region);
    end
      
    % **********************************
    % VOT: Output the results
    % **********************************
    handle.quit(handle);
end

gongbudaizhe avatar Oct 08 '17 15:10 gongbudaizhe

Hi, Thanks for pointing it out. On a first look, your script seems fine, but to be honest I have some troubles remembering the exact setup, as more than one year passed. Are you sure you are using the batchnorms in eval mode? Behaviour changes and might affect results significantly.

I have noticed that results may change significantly between two different commits of the VOT toolkit unfortunately. Also Matconvnet and cuda versions might be involved. Did you try with OTB? Thats usually much more stable. Does your code match the website results?

On 8 Oct 2017 16:31, "Bi Li" [email protected] wrote:

Hi, luca

Recently, I am trying to reproduce your results in the VOT2016 benchmark. By using provided result files https://www.robots.ox.ac.uk/%7Eluca/stuff/siam-fc_results/vot16.zip, the computed EAO is 0.2905.

However, evaluating the pretrained color model https://www.robots.ox.ac.uk/%7Eluca/stuff/siam-fc_nets/2016-08-17.net.mat with 3 search scales, using VOT benchmark toolkits myself, I only get EAO 0.2247.

Below is the evaluation script. Is there anything I am missing?

function sfc_vot % ************************************************************* % VOT: Always call exit command at the end to terminate Matlab! % ************************************************************* cleanup = onCleanup(@() exit() );

% *************************************************************
% VOT: Set random seed to a different value every time.
% *************************************************************
RandStream.setGlobalStream(RandStream('mt19937ar', 'Seed', sum(clock)));

% *************************************************************
% SFC: Set tracking parameters
% *************************************************************
p.numScale = 3;
p.scaleStep = 1.0375;
p.scalePenalty = 0.9745;
p.scaleLR = 0.59; % damping factor for scale update
p.responseUp = 16; % upsampling the small 17x17 response helps

with the accuracy p.windowing = 'cosine'; % to penalize large displacements p.wInfluence = 0.176; % windowing influence (in convex sum) p.net = '2016-08-17.net.mat'; %% execution, visualization, benchmark p.gpus = 1; p.fout = -1; %% Params from the network architecture, have to be consistent with the training p.exemplarSize = 127; % input z size p.instanceSize = 255; % input x size (search region) p.scoreSize = 17; p.totalStride = 8; p.contextAmount = 0.5; % context amount for the exemplar p.subMean = false; %% SiamFC prefix and ids p.prefix_z = 'a_'; % used to identify the layers of the exemplar p.prefix_x = 'b_'; % used to identify the layers of the instance p.prefix_join = 'xcorr'; p.prefix_adj = 'adjust'; p.id_feat_z = 'a_feat'; p.id_score = 'score';%

startup;

% Get environment-specific default paths.
p = env_paths_tracking(p);
% Load ImageNet Video statistics
if exist(p.stats_path,'file')
    stats = load(p.stats_path);
else
    warning('No stats found at %s', p.stats_path);
    stats = [];
end
% Load two copies of the pre-trained network
net_z = load_pretrained([p.net_base_path p.net], p.gpus);
net_x = load_pretrained([p.net_base_path p.net], []);

% Divide the net in 2
% exemplar branch (used only once per video) computes features for

the target remove_layers_from_prefix(net_z, p.prefix_x); remove_layers_from_prefix(net_z, p.prefix_join); remove_layers_from_prefix(net_z, p.prefix_adj); % instance branch computes features for search region x and cross-correlates with z features remove_layers_from_prefix(net_x, p.prefix_z); zFeatId = net_z.getVarIndex(p.id_feat_z); scoreId = net_x.getVarIndex(p.id_score);

% **********************************
% VOT: Get initialization data
% **********************************
[handle, first_image, region] = vot('rectangle');

% If the provided region is a polygon ...
if numel(region) > 4
  x1 = round(min(region(1:2:end)));
  x2 = round(max(region(1:2:end)));
  y1 = round(min(region(2:2:end)));
  y2 = round(max(region(2:2:end)));
  region = round([x1, y1, x2 - x1, y2 - y1]);
else
  region = round([round(region(1)), round(region(2)), ...
                  round(region(1) + region(3)) - round(region(1)), ...
                  round(region(2) + region(4)) - round(region(2))]);
end;

irect = region
targetPosition = [irect(2) + (1 + irect(4)) / 2 irect(1) + (1 +

irect(3)) / 2]; targetSize = [irect(4) irect(3)];

startFrame = 1;
% get the first frame of the video
im = gpuArray(single(imread(first_image)));
% if grayscale repeat one channel to match filters size
if(size(im, 3)==1)
    im = repmat(im, [1 1 3]);
end
% get avg for padding
avgChans = gather([mean(mean(im(:,:,1))) mean(mean(im(:,:,2)))

mean(mean(im(:,:,3)))]);

wc_z = targetSize(2) + p.contextAmount*sum(targetSize);
hc_z = targetSize(1) + p.contextAmount*sum(targetSize);
s_z = sqrt(wc_z*hc_z);
scale_z = p.exemplarSize / s_z;
% initialize the exemplar
[z_crop, ~] = get_subwindow_tracking(im, targetPosition,

[p.exemplarSize p.exemplarSize], [round(s_z) round(s_z)], avgChans); if p.subMean z_crop = bsxfun(@minus, z_crop, reshape(stats.z.rgbMean, [1 1 3])); end d_search = (p.instanceSize - p.exemplarSize)/2; pad = d_search/scale_z; s_x = s_z + 2pad; % arbitrary scale saturation min_s_x = 0.2s_x; max_s_x = 5*s_x;

switch p.windowing
    case 'cosine'
        window = single(hann(p.scoreSize*p.responseUp) *

hann(p.scoreSizep.responseUp)'); case 'uniform' window = single(ones(p.scoreSizep.responseUp, p.scoreSize*p.responseUp)); end % make the window sum 1 window = window / sum(window(:)); scales = (p.scaleStep .^ ((ceil(p.numScale/2)-p.numScale) : floor(p.numScale/2))); % evaluate the offline-trained network for exemplar z features net_z.eval({'exemplar', z_crop}); z_features = net_z.vars(zFeatId).value; z_features = repmat(z_features, [1 1 1 p.numScale]);

% start tracking
i = startFrame;
while true
    % **********************************
    % VOT: Get next frame
    % **********************************
    [handle, image] = handle.frame(handle);

    if isempty(image)
      break;
    end;

    if i>startFrame
        % load new frame on GPU
        im = gpuArray(single(imread(image)));
        % if grayscale repeat one channel to match filters size
        if(size(im, 3)==1)
          im = repmat(im, [1 1 3]);
        end
        scaledInstance = s_x .* scales;
        scaledTarget = [targetSize(1) .* scales; targetSize(2) .* scales];
        % extract scaled crops for search region x at previous

target position x_crops = make_scale_pyramid(im, targetPosition, scaledInstance, p.instanceSize, avgChans, stats, p); % evaluate the offline-trained network for exemplar x features [newTargetPosition, newScale] = tracker_eval(net_x, round(s_x), scoreId, z_features, x_crops, targetPosition, window, p); targetPosition = gather(newTargetPosition); % scale damping and saturation s_x = max(min_s_x, min(max_s_x, (1-p.scaleLR)s_x + p.scaleLRscaledInstance(newScale))); targetSize = (1-p.scaleLR)targetSize + p.scaleLR[scaledTarget(1,newScale) scaledTarget(2,newScale)]; else % at the first frame output position and size passed as input (ground truth) end i = i + 1; rectPosition = [targetPosition([2,1]) - targetSize([2,1])/2, targetSize([2,1])]; % output bbox in the original frame coordinates oTargetPosition = targetPosition; % .* frameSize ./ newFrameSize; oTargetSize = targetSize; % .* frameSize ./ newFrameSize; region = [oTargetPosition([2,1]) - oTargetSize([2,1])/2, oTargetSize([2,1])];

    % **********************************
    % VOT: Report position for frame
    % **********************************
    handle = handle.report(handle, region);
end

% **********************************
% VOT: Output the results
% **********************************
handle.quit(handle);end

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bertinetto/siamese-fc/issues/33, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLRuaPlGYzLQ-pV_0qIAYqzPZ8V7jaaks5sqOqzgaJpZM4Pxvfb .

bertinetto avatar Oct 08 '17 17:10 bertinetto

@gongbudaizhe @bertinetto I met the promblem "Tracker has not passed the TraX support test." when evaluating VOT. May I ask for solution ???

KengChiLiu avatar Nov 05 '17 13:11 KengChiLiu

Initializing workspace ... Verifying native components ... Testing TraX protocol support for tracker SiamFC. Tracker execution interrupted: Unable to establish connection. TraX support not detected. Error using tracker_load (line 127) Tracker has not passed the TraX support test.

Error in run_experiments (line 8) tracker = tracker_load('SiamFC');

KengChiLiu avatar Nov 06 '17 08:11 KengChiLiu

hi,@KengChiLiu Have you solved this problem? I also encountered the same problem.

zlj199502 avatar Jan 23 '18 07:01 zlj199502