brainiak icon indicating copy to clipboard operation
brainiak copied to clipboard

FCMA feature selection excludes the best performed voxel

Open peetal opened this issue 4 years ago • 2 comments

Hi,

I found this potential issue for FCMA feature selection step, which may lead to excluding the best performed voxel when selecting the top k number of voxels.

At the end of the fcma_voxel_selection_cv.py:

with open(file_str + 'result_list.txt', 'w') as fp:
    for idx, tuple in enumerate(results):
    fp.write(str(tuple[0]) + ' ' + str(tuple[1]) + '\n')

    # Store the score for each voxel
    score[tuple[0]] = tuple[1]
    seq[tuple[0]] = idx

result is an iterator of tuples. tuple[0] is the voxel ID, which index the voxel, tuple[1] is that voxel's score. The tuples are ranked, such that the highest performed voxel would be ranked at the top, thus when being enumerated, the best performed voxel would have idx = 0. As a result, seq[tuple[0]] = idx would assign the best performed voxel the rank of 0.

Then when using fslmaths to select the top k number of voxels, as in make_top_voxel_mask.sh:

for file in ${input_dir}/*_seq.nii.gz
do	
	# Preprocess the file name
	fbase=$(basename "$file")
	pref="${fbase%%.*}"
	
	# Create the voxel mask
	fslmaths $file -uthr $voxel_number -bin ${output_dir}/${pref}_top${voxel_number}.nii.gz

done

-uthr would up-threshold the input file based on the voxel_number input. For example, it k = 3000, -uthr would select voxels that have the rank from 0-3000, including the top 3000 voxels and all non-brain voxels, which also have the value of 0. Then -bin would binarize the file into a mask, excluding all voxels that have 0 value, including the non-brain voxels and the best performed voxel which has the value of 0 because it ranks 0. In this way, I believe FCMA feature selection would exclude the top-performed voxel.

If I was correct about this issue, the solution should be pretty simple, and can be done: (just added +1 to idx)

with open(file_str + 'result_list.txt', 'w') as fp:
    for idx, tuple in enumerate(results):
    fp.write(str(tuple[0]) + ' ' + str(tuple[1]) + '\n')

    # Store the score for each voxel
    score[tuple[0]] = tuple[1]
    seq[tuple[0]] = idx + 1

Please let me know if this does or doesn't makes any sense or if I misunderstood the script and this is not a potential issue. Thank you all very much!

peetal avatar Oct 29 '20 23:10 peetal

@yidawang are you able to look at this?

CameronTEllis avatar Nov 02 '20 19:11 CameronTEllis

The description makes sense to me. I couldn't remember all the details of how fslmaths works. If it is as described above, I am fine with the proposed fix with a 1-based index system. Please submit a PR to fix it. Thanks!

yidawang avatar Nov 02 '20 19:11 yidawang