libsvm icon indicating copy to clipboard operation
libsvm copied to clipboard

Nested cross-validation

Open ziqianwang9 opened this issue 5 years ago • 3 comments

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all;
load median20190923.mat

%leave-one-out cross-validation
w = zeros(size(data_all));% weight
h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)
    waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))])
    new_DATA = data_all;
    new_label  = label;
    test_data   = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA;
    test_label   = label(i,:);new_label(i,:) = [];train_label = new_label;
    
%  Data Normalization
    [train_data,PS] = mapminmax(train_data',0,1);
    test_data          = mapminmax('apply',test_data',PS);
    train_data = train_data';
    test_data   = test_data';
    
    % RFE feature selectioin
    step = 1;
    ftRank = SVMRFE(train_label,train_data, step,'-t 0');
    IX = ftRank(1:ceil(length(ftRank)*0.4));
    
    [bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1);
    cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];
    
    model = svmtrain(train_label,train_data(:,IX),cmd);
    w(i,IX)   = model.SVs'*model.sv_coef; 
    [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model);
    acc(i,1) = accuracy(1);
    deci_value(i,1) = deci;
%     clear  test_data  train_data test_label train_label model IX k
end
w_msk = double(sum(w~=0,1)==size(w,1));
w = mean(w,1).*w_msk;
acc_final = mean(acc);
disp(['accuracy - ',num2str(acc_final)]);

% ROC
[X,Y,T,AUC] = perfcurve(label,deci_value,1);
figure;plot(X,Y);hold on;plot(X,X,'-');
xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X)
    Cut_off(i,1) = (1-X(i))*Y(i);
end
[~,maxind] = max(Cut_off);
Specificity = 1-X(maxind);
Sensitivty = Y(maxind);
disp(['Specificity= ', num2str(Specificity)]);
disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n');
Nsloop = 5000;
auc_rand = zeros(Nsloop,1);
for i=1:Nsloop
    label_rand = randperm(length(label));
    deci_value_rand = deci_value(label_rand);
    [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1);
    clear label_rand
end
p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1);
disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

ziqianwang9 avatar Feb 20 '20 15:02 ziqianwang9

To implement CV in matlab what you need to do are

  • randomly permute data by randperm()

  • use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

  • then use this "range" to extract the validation fold. The training fold can be get by a similar way

  • then do training/prediction, and aggregate results to get CV acuracy

  • for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';

% RFE feature selectioin
step = 1;
ftRank = SVMRFE(train_label,train_data, step,'-t 0');
IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] =

SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];

model = svmtrain(train_label,train_data(:,IX),cmd);
w(i,IX)   = model.SVs'*model.sv_coef;
[predicted_label, accuracy, deci] =

svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ

cjlin1 avatar Feb 20 '20 21:02 cjlin1

Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV:

The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).

My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful.

Best, Ziqian

在 2020年2月20日,下午10:30,Chih-Jen Lin [email protected] 写道:

To implement CV in matlab what you need to do are

  • randomly permute data by randperm()

  • use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

  • then use this "range" to extract the validation fold. The training fold can be get by a similar way

  • then do training/prediction, and aggregate results to get CV acuracy

  • for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';

% RFE feature selectioin step = 1; ftRank = SVMRFE(train_label,train_data, step,'-t 0'); IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];

model = svmtrain(train_label,train_data(:,IX),cmd); w(i,IX) = model.SVs'*model.sv_coef; [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ.

ziqianwang9 avatar Feb 28 '20 10:02 ziqianwang9

Dear Lin, I found that this nested VC add grid search in every loop of inner loop. If it’s 5-fold, it calculate 5 best-c, then calculate arithmetic mean/geometric mean or power mean. Here is also a description in Chinese: 这个思想有两个循环(loop):(1)外循环就是普通的cross validation (2)内循环相当于是一个子优化问题,通过grid search寻常当前子问题中模型对应的最优参数。grid search就相当于是遍历有限的空间点(每一个点对应于一组参数),每一组参数对应一个模型的performance,然后选取performance最好的模型。

cross validation用了几个fold最后就有几组模型参数,如果你的模型是stable的,那么这几组参数应该类似。

I don’t know if this is the state of art. But it should be a good way to solve the problem of information ‘leaking’. Could we manage to implement to your wonderful libsvm toolbox?

Best, Ziqian

在 2020年2月28日,上午11:19,王子谦 [email protected] 写道:

Thank your for your reply. As the knowledge I have, the nested is not 2-level CV. This figure could illustrate what is nested CV: <F1QgU.png> The nested CV has an inner loop CV nested in an outer CV. The inner loop is responsible for model selection/hyperparameter tuning (similar to validation set), while the outer loop is for error estimation (test set).

My question is how is our '[bestacc,bestc] =SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1)’ working on hyperparameter tuning? Do we use the similar method? If not, can we combine it with SVMcgForClass_NoDisplay_linear? Any response will be helpful.

Best, Ziqian

在 2020年2月20日,下午10:30,Chih-Jen Lin <[email protected] mailto:[email protected]> 写道:

To implement CV in matlab what you need to do are

  • randomly permute data by randperm()

  • use a for loop to get each validation fold

num_per_fold = ceil(num_data/num_fold); for i = 1 : num_fold range = (i-1)num_per_fold + 1 : min(num_data, inum_per_fold);

  • then use this "range" to extract the validation fold. The training fold can be get by a similar way

  • then do training/prediction, and aggregate results to get CV acuracy

  • for nested CV I think you mean 2-level CV. You can use a 2-level for loop on that

On 2020-02-20 23:56, ziqianwang9 wrote:

Dear Lin, thanks for providing this useful toolbox. I'm trying to use it to publish a paper, here I met some problem from the reviewer. He suggested me to use the nested cross-validation. Here I list the script I used for my study:

clear all; load median20190923.mat

%leave-one-out cross-validation w = zeros(size(data_all));% weight h = waitbar(0,'please wait..');

for i = 1:size(data_all,1)

waitbar(i/size(data_all,1),h,[num2str(i),'/',num2str(size(data_all,1))]) new_DATA = data_all; new_label = label; test_data = data_all(i,:); new_DATA(i,:) = []; train_data = new_DATA; test_label = label(i,:);new_label(i,:) = [];train_label = new_label;

% Data Normalization [train_data,PS] = mapminmax(train_data',0,1); test_data = mapminmax('apply',test_data',PS); train_data = train_data'; test_data = test_data';

% RFE feature selectioin step = 1; ftRank = SVMRFE(train_label,train_data, step,'-t 0'); IX = ftRank(1:ceil(length(ftRank)*0.4));

[bestacc,bestc] = SVMcgForClass_NoDisplay_linear(train_label,train_data(:,IX),-10,10,5,0.1); cmd = ['-t 0 ', ' -c ',num2str(bestc),' -w1 2 -w-1 1'];

model = svmtrain(train_label,train_data(:,IX),cmd); w(i,IX) = model.SVs'*model.sv_coef; [predicted_label, accuracy, deci] = svmpredict(test_label,test_data(:,IX),model); acc(i,1) = accuracy(1); deci_value(i,1) = deci; % clear test_data train_data test_label train_label model IX k end w_msk = double(sum(w~=0,1)==size(w,1)); w = mean(w,1).*w_msk; acc_final = mean(acc); disp(['accuracy - ',num2str(acc_final)]);

% ROC [X,Y,T,AUC] = perfcurve(label,deci_value,1); figure;plot(X,Y);hold on;plot(X,X,'-'); xlabel('False positive rate'); ylabel('True positive rate');

for i=1:length(X) Cut_off(i,1) = (1-X(i))*Y(i); end [~,maxind] = max(Cut_off); Specificity = 1-X(maxind); Sensitivty = Y(maxind); disp(['Specificity= ', num2str(Specificity)]); disp(['Sensitivty= ', num2str(Sensitivty)]);

fprintf('Permutation test ......\n'); Nsloop = 5000; auc_rand = zeros(Nsloop,1); for i=1:Nsloop label_rand = randperm(length(label)); deci_value_rand = deci_value(label_rand); [~,~,~,auc_rand(i)] = perfcurve(label,deci_value_rand,1); clear label_rand end p_auc = (length(find((auc_rand > AUC)))+1)/(Nsloop+1); disp(['Pvalue= ', num2str(p_auc)]);

Here, what I used is leave-one-out cross-valitaion. But the reviewer suggest me to use the neseted cross-valitaion(e.g. Varoquaux et al., Neuroimage, 2017) and K-fold. Since I am not familiar with nested cross-validation. Is it any possible we perform it based on your libsvm? If it is, could you please give me some clue how to achieve this?

Best, Ziqian

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org http://schema.org/", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "url": "https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications\u0026email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com https://github.com/" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&amp;email_token=ABI3BHV62VSU7IEJTR5GH23RD2R2ZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IPBCY3A [2] https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ https://github.com/notifications/unsubscribe-auth/ABI3BHRYMTNKNGKRC3P4DTTRD2R2ZANCNFSM4KYRVGXQ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cjlin1/libsvm/issues/163?email_source=notifications&email_token=AH4SOUKYJP2I4QT47KCVPEDRD3ZAZA5CNFSM4KYRVGX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMQG5YI#issuecomment-589328097, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4SOUIRJU2YNGYBVC5OCHTRD3ZAZANCNFSM4KYRVGXQ.

ziqianwang9 avatar Mar 02 '20 11:03 ziqianwang9