evaluation for custom data
Good morning,
I'm trying to run some of my own data and I am at the evaluation section. I'm calculating a confusion matrix and the numbers seem to be a bit strange so I just want to make sure I'm doing it correctly or figure out what I'm doing wrong. Here is a snippet of my code.
pred_list = [pred for pred in os.listdir(data_folder) \
if pred.split(".")[0].split("_")[-1] == 'pred']
#Data folder is where my predicted and tested h5 files are located
acc, tot = 0,0
result = np.zeros((NUM_CLASSES,NUM_CLASSES),dtype=int) #result is the confusion matrix
max_ind = 0 #Calculate max indices
for pred in pred_list:
data = h5py.File(os.path.join(data_folder, pred))
f = '_'.join(pred.split('_')[:-1])+'.h5' #Open corresponding test h5 file
data_test = h5py.File(os.path.join(data_folder, f))
# Open predicted h5 file
labels_seg = data['label_seg'][...].astype(np.int64)
indices = data['indices_split_to_full'][...].astype(np.int64)
if indices.max() > max_ind: max_ind = indices.max()
confidence = data['confidence'][...].astype(np.float32)
data_num = data['data_num'][...].astype(np.int64)
# Open test h5 file
t_labels_seg = data_test['label_seg'][...].astype(np.int64)
t_indices = data_test['indices_split_to_full'][...].astype(np.int64)
t_data_num = data_test['data_num'][...].astype(np.int64)
# Loop through corresponding h5 file and calculate confusion matrix
for i in range(labels_seg.shape[0]):
test = t_labels_seg[i][:t_data_num[i]]
predicted = labels_seg[i][:data_num[i]]
test_ind = t_indices[i][:t_data_num[i]]
ind = indices[i][:data_num[i]]
if False in np.equal(test_ind,ind): print('Indices don\'t match!') # Just a sanity check to ensure the indices match
tot += test.shape[0]
dif = test==predicted
acc+=dif.sum()
### Calculate confusion matrix
for i in range(len(predicted)):
result[test[i]][predicted[i]] += 1
data.close()
data_test.close()
Now, when I look at tot, max_ind, and np.sum(result), I would have thought that tot==max_ind==np.sum(result), however tot>max_ind. Is this to be expected? Because the max_ind is the maximum index point in the test set, so I'm not sure how the total points tested can be greater than that unless there are repeats.
Not sure if this helps you at all, and i would like to know if you figure this out. But here is a snippet of the article:
"Each testing point cloud is sampled multiple times to make sure all the points are evaluated at least r (r = 10 in our experiments) times at testing time"
Can this maybe explain why there are repeats?
I am having a similar issue, have you made any progress on this?
Good morning everyone!
Thank you for the comments. Nicolai Mogensen I did not recall that part of the article but it does make sense why there are repeats. It seems I will have to use indices_split_to_full instead of data_num. I will work on this more. I went back and worked on some training code so I stepped away from this for a while. I will post more if I figure it out. Thanks!
Please let me know if you are able to recall the indices from the original data with "indices_split_to_full". I am finding that the normalized blocks stored in "data" are correct but when I try to map them back to the original data in the merge using "indices_split_to_full" , the blocks are not making sense.
Sorry for the late response. Here's something I've come up with that may work (May need to check my work though)
# Import modules
import h5py
import numpy as np
# Load in h5 file (just as an example)
pred_file = '/path/to/h5-files/file_pred.h5'
# Load in h5 files
data = h5py.File(pred_file)
img = data['data'][...]
data_num = data['data_num'][...]
indices = data['indices_split_to_full'][...]
label_seg = data['label_seg'][...]
confidence = data['confidence'][...]
max_ind = np.max(indices) # Get max index
label_flat = -1 * np.ones(max_ind, dtype=np.int32) # I make it -1 since a label of '0' is an actual label
label_flat[indices.flatten()-1]= label_seg.flatten() # indicies.flatten() - 1 because the index is out of range with indices.flatten()
Not sure how to loop through the rest of them but I think this right?... Not sure. I'm also not sure how to add in confidence either