libsvm Nested cross-validation

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all;
load median20190923.mat

%leave-one-out cross-validation
w = zeros(size(data_all));% weight
h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)
    waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
    new_DATA = data_all;
    new_label  = label;
    test_data   = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA;
    test_label   = label(i,:);new_label(i,:) = [];train_label = new_label;
    
%  Data Normalization
    [train_data,PS] = mapminmax(train_data',0,1);
    test_data          = mapminmax('apply',test_data',PS);
    train_data = train_data';
    test_data   = test_data';
    
    % RFE feature selectioin
    step = 1;
    ftRank = SVMRFE(train_label,train_data, step,'-t 0');
    IX = ftRank(1:ceil(length(ftRank)*0.4));
    
    [bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
    cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
    
    model = svmtrain(train_label,train_data(:,IX),cmd);
    w(i,IX)   = model.SVs'*model.sv_coef; 
    [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model);
    acc(i,1) = accuracy(1);
    deci_value(i,1) = deci;
%     clear  test_data  train_data test_label train_label model IX k
end
w_msk = double(sum(w~=0,1)==size(w,1));
w = mean(w,1).*w_msk;
acc_final = mean(acc);
disp(['accuracy - ',num2str(acc_final)]);

% ROC
[X,Y,T,AUC] = perfcurve(label,deci_value,1);
figure;plot(X,Y);hold on;plot(X,X,'-');
xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X)
    Cut_off(i,1) = (1-X(i))*Y(i);
end
[~,maxind] = max(Cut_off);
Specificity = 1-X(maxind);
Sensitivty = Y(maxind);
disp(['Specificity= ', num2str(Specificity)]);
disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n');
Nsloop = 5000;
auc_rand = zeros(Nsloop,1);
for i=1:Nsloop
    label_rand = randperm(length(label));
    deci_value_rand = deci_value(label_rand);
    [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
    clear label_rand
end
p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

Feb 20 '20 15:02 ziqianwang9

To implement CV in matlab what you need to do are

randomly permute data by randperm()
use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

then use this "range" to extract the validation fold. The training fold can be get by a similar way
then do training/prediction, and aggregate results to get CV acuracy
for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';
% RFE feature selectioin
step = 1;
ftRank = SVMRFE(train_label,train_data, step,'-t 0');
IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] =
SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
model = svmtrain(train_label,train_data(:,IX),cmd);
w(i,IX)   = model.SVs'*model.sv_coef;
[predicted_label, accuracy, deci] =
svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ

Feb 20 '20 21:02 cjlin1

Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV:

The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).

My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful.

Best, Ziqian

在 2020年2月20日，下午10:30，Chih-Jen Lin [email protected] 写道：

To implement CV in matlab what you need to do are

randomly permute data by randperm()

use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

then use this "range" to extract the validation fold. The training fold can be get by a similar way

then do training/prediction, and aggregate results to get CV acuracy

for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';

% RFE feature selectioin step = 1; ftRank = SVMRFE(train_label,train_data, step,'-t 0'); IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];

model = svmtrain(train_label,train_data(:,IX),cmd); w(i,IX) = model.SVs'*model.sv_coef; [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ.

Feb 28 '20 10:02 ziqianwang9

Dear Lin, I found that this nested VC add grid search in every loop of inner loop. If it’s 5-fold, it calculate 5 best-c, then calculate arithmetic mean/geometric mean or power mean. Here is also a description in Chinese: 这个思想有两个循环（loop）：（1）外循环就是普通的cross validation （2）内循环相当于是一个子优化问题，通过grid search寻常当前子问题中模型对应的最优参数。grid search就相当于是遍历有限的空间点（每一个点对应于一组参数），每一组参数对应一个模型的performance，然后选取performance最好的模型。

cross validation用了几个fold最后就有几组模型参数，如果你的模型是stable的，那么这几组参数应该类似。

I don’t know if this is the state of art. But it should be a good way to solve the problem of information ‘leaking’. Could we manage to implement to your wonderful libsvm toolbox?

Best, Ziqian

在 2020年2月28日，上午11:19，王子谦 [email protected] 写道：

Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV: <F1QgU.png> The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).

My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful.

Best, Ziqian

在 2020年2月20日，下午10:30，Chih-Jen Lin <[email protected] mailto:[email protected]> 写道：

To implement CV in matlab what you need to do are

randomly permute data by randperm()

use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

then use this "range" to extract the validation fold. The training fold can be get by a similar way

then do training/prediction, and aggregate results to get CV acuracy

for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';

% RFE feature selectioin step = 1; ftRank = SVMRFE(train_label,train_data, step,'-t 0'); IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];

model = svmtrain(train_label,train_data(:,IX),cmd); w(i,IX) = model.SVs'*model.sv_coef; [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org http://schema.org/", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com https://github.com/" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ.

Mar 02 '20 11:03 ziqianwang9

libsvm libsvm copied to clipboard

Nested cross-validation

Links:

Links:

Links:

libsvm
libsvm copied to clipboard