GeoChat How to calculate the metrics acc@0.5, acc@0.25, ROUGE and METEOR score in table 7, 8, 9?

Hi author, It is nice work. When run the evaluation codes, I find the output is json file. My questions: How to calculate the metrics in table 7, 8, 9? Would you like to provide the code for computing the metrics?

Thank you

ywsun

python geochat/eval/batch_geochat_grounding.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/

python geochat/eval/batch_geochat_referring.py
--model-path /path/to/model
--question-file path/to/jsonl/file
--answer-file path/to/output/jsonl/file
--image_folder path/to/image/folder/

Apr 03 '24 04:04 simba0626

I wrote a function for calculating the metrics of table 8, and it looks good.

def evaluation_metrics(data_path):
    base = []
    with open(data_path, "r") as fp:
        lines = fp.readlines()
    for line in lines:
        base.append(json.loads(line))

    correct = 0
    incorrect = 0
    comp_correct = 0
    comp_incorrect = 0
    pre_correct = 0
    pre_incorrect = 0
    ru_correct = 0
    ru_incorrect = 0
    for answers in tqdm(base):
        gt = answers["gt"].lower()
        type_ = answers["type"]
        answer = answers["answer"].replace(" ", "").lower().replace(".", "")
        if gt == answer:
            correct = correct + 1
            if type_ == "comp":
                comp_correct = comp_correct + 1
            if type_ == "presence":
                pre_correct = pre_correct + 1
            if type_ == "rural_urban":
                ru_correct = ru_correct + 1
        else:
            incorrect = incorrect + 1
            if type_ == "comp":
                comp_incorrect = comp_incorrect + 1
            if type_ == "presence":
                pre_incorrect = pre_incorrect + 1
            if type_ == "rural_urban":
                ru_incorrect = ru_incorrect + 1

    print("presence_correct:", pre_correct)
    print("presence_incorrect:", pre_incorrect)
    print("presence_Total:", pre_correct + pre_incorrect)
    print("presence_Acc:", (pre_correct / (pre_correct + pre_incorrect)))
    print("-" * 100)
    print("comparison_correct:", comp_correct)
    print("comparison_incorrect:", comp_incorrect)
    print("comparison_Total:", comp_correct + comp_incorrect)
    print("comparison_Acc:", (comp_correct / (comp_correct + comp_incorrect)))
    print("-" * 100)
    if ru_correct + ru_incorrect != 0:
        print("rural_urban_correct:", ru_correct)
        print("rural_urban_incorrect:", ru_incorrect)
        print("rural_urban_Total:", ru_correct + ru_incorrect)
        print("rural_urban_Acc:", (ru_correct / (ru_correct + ru_incorrect)))
        print("-" * 100)
    print("total_correct:", correct)
    print("total_incorrect:", incorrect)
    print("total_Total:", correct + incorrect)
    print("total_Acc:", correct / (correct + incorrect))

I am also waiting for the metric calculation function of Table 7 and Table 9.

Apr 07 '24 08:04 Hoteryoung

I wrote a function for calculating the metrics of table 8, and it looks good.

def evaluation_metrics(data_path):
    base = []
    with open(data_path, "r") as fp:
        lines = fp.readlines()
    for line in lines:
        base.append(json.loads(line))

    correct = 0
    incorrect = 0
    comp_correct = 0
    comp_incorrect = 0
    pre_correct = 0
    pre_incorrect = 0
    ru_correct = 0
    ru_incorrect = 0
    for answers in tqdm(base):
        gt = answers["gt"].lower()
        type_ = answers["type"]
        answer = answers["answer"].replace(" ", "").lower().replace(".", "")
        if gt == answer:
            correct = correct + 1
            if type_ == "comp":
                comp_correct = comp_correct + 1
            if type_ == "presence":
                pre_correct = pre_correct + 1
            if type_ == "rural_urban":
                ru_correct = ru_correct + 1
        else:
            incorrect = incorrect + 1
            if type_ == "comp":
                comp_incorrect = comp_incorrect + 1
            if type_ == "presence":
                pre_incorrect = pre_incorrect + 1
            if type_ == "rural_urban":
                ru_incorrect = ru_incorrect + 1

    print("presence_correct:", pre_correct)
    print("presence_incorrect:", pre_incorrect)
    print("presence_Total:", pre_correct + pre_incorrect)
    print("presence_Acc:", (pre_correct / (pre_correct + pre_incorrect)))
    print("-" * 100)
    print("comparison_correct:", comp_correct)
    print("comparison_incorrect:", comp_incorrect)
    print("comparison_Total:", comp_correct + comp_incorrect)
    print("comparison_Acc:", (comp_correct / (comp_correct + comp_incorrect)))
    print("-" * 100)
    if ru_correct + ru_incorrect != 0:
        print("rural_urban_correct:", ru_correct)
        print("rural_urban_incorrect:", ru_incorrect)
        print("rural_urban_Total:", ru_correct + ru_incorrect)
        print("rural_urban_Acc:", (ru_correct / (ru_correct + ru_incorrect)))
        print("-" * 100)
    print("total_correct:", correct)
    print("total_incorrect:", incorrect)
    print("total_Total:", correct + incorrect)
    print("total_Acc:", correct / (correct + incorrect))

I am also waiting for the metric calculation function of Table 7 and Table 9.

I am currently facing this issue. Have you implemented the metric calculations in other tables?

Jun 02 '24 14:06 Yting68

Not yet

Jun 02 '24 14:06 Hoteryoung

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py. (https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度（弧度）
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

Jul 10 '24 14:07 YizhuoQ

extract_bboxes

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py. (https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度（弧度）
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

How to implement the function of extract_bboxes? thank you.

Jul 16 '24 06:07 simba0626

extract_bboxes

I wrote a script for visual grounding evaluation in table 7. I used a bounding box calculate package [BboxToolkit] I think it's correct but I can't get the same result in paper. I don't know what's wrong. The bbox_and_angle_to_polygon function copy from geochat_demo.py. (https://github.com/jbwang1997/BboxToolkit/blob/master/USAGE.md).

def bbox_and_angle_to_polygon(x1, y1, x2, y2, a):
    # 计算中心点坐标
    x_ctr = (x1 + x2) / 2
    y_ctr = (y1 + y2) / 2
    
    # 计算宽度和高度
    w = abs(x2 - x1)
    h = abs(y2 - y1)
    
    # 计算角度（弧度）
    angle_rad = math.radians(a)
    
    # 计算旋转后的四个角点坐标
    cos_a = math.cos(angle_rad)
    sin_a = math.sin(angle_rad)
    
    x1_rot = cos_a * (-w / 2) - sin_a * (-h / 2) + x_ctr
    y1_rot = sin_a * (-w / 2) + cos_a * (-h / 2) + y_ctr
    
    x2_rot = cos_a * (w / 2) - sin_a * (-h / 2) + x_ctr
    y2_rot = sin_a * (w / 2) + cos_a * (-h / 2) + y_ctr
    
    x3_rot = cos_a * (w / 2) - sin_a * (h / 2) + x_ctr
    y3_rot = sin_a * (w / 2) + cos_a * (h / 2) + y_ctr
    
    x4_rot = cos_a * (-w / 2) - sin_a * (h / 2) + x_ctr
    y4_rot = sin_a * (-w / 2) + cos_a * (h / 2) + y_ctr
    
    # 返回多边形坐标
    polygon_coords = np.array((x1_rot, y1_rot, x2_rot, y2_rot, x3_rot, y3_rot, x4_rot, y4_rot))
    
    return polygon_coords

    # read the answer file output by `GeoChat/geochat/eval/batch_geochat_referring.py`, and save as a list `geochat_predict`.
    for i, predict in tqdm(enumerate(geochat_predict)):
        answer = predict['answer']
        answer = answer.replace("<unk>","").replace(" ","").strip()
        images_dir = '../Dataset/GeoChat/referring_images'      
        image_path = os.path.join(images_dir, predict['image_id'] + '.png')
        image = Image.open(image_path)
        width, height = image.size
        size_type = predict['type']
        gt_bboxes = predict['ground_truth']       # list
        predict_boxes = extract_bboxes(answer)    # list
        for i in range(len(gt_bboxes)):
            # convert coordinates to float
            poly = np.array(gt_bboxes[i]).astype(np.float32).reshape(-1)   # [4,2]
            gt_obb = bt.poly2obb(poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
            try:
                pred_bbox = predict_boxes[i]
                pred_bbox[0] = pred_bbox[0] / scale * width
                pred_bbox[1] = pred_bbox[1] / scale * height
                pred_bbox[2] = pred_bbox[2] / scale * width
                pred_bbox[3] = pred_bbox[3] / scale * height
                pred_poly = bbox_and_angle_to_polygon(*pred_bbox)
                pred_obb = bt.poly2obb(pred_poly).reshape(1,5)                # convert to [cx, cy, w, h, theta]
                iou_score = bt.geometry.bbox_overlaps(pred_obb, gt_obb)[0][0]   # calcualte obb Iou by BboxToolkit.
                if iou_score >= 0.5:
                    correct += 1
            except:
                continue

        
    dataset = 'GeoChat Bench referring'
    print(f"Evaluating {dataset} ...")
    print(f'Precision @ 0.5: {correct / total_cnt} \n')

Finally, I got a [email protected]=0.22744 as a result, my test data was come from GeoChat Bench referring.jsonl, with 7593 test samples. I was confused with the Iou result presented in the paper. I don't know how to get the same result.

How to implement the function of extract_bboxes? thank you.

I implement the extract_bboxes function as follows:

import re

def extract_bboxes(output):
"""
Extract bounding box coordinates from the given string using regular expressions.
:param output: String containing bounding box coordinates in the format {<bx_left><by_top><bx_right><by_bottom>|θ}
:return: List of bounding boxes, each in the format [bx_left, by_top, bx_right, by_bottom, θ]
"""
  # 修改正则表达式，确保最后一个数字和管道符号能够正确匹配
  pattern = r'{<(\d+)><(\d+)><(\d+)><(\d+)>|<(\d+)>}'
  matches = re.findall(pattern, output)
  bboxes = []
  for match in matches:
      # 将所有匹配的坐标转换为浮点数，并添加到 bboxes 列表中
      bbox = [int(coord) for coord in match]  # 用int而不是float, 坐标是整数
      bboxes.append(bbox)
return bboxes

Jul 23 '24 07:07 YizhuoQ

How to calculate the metrics [email protected], [email protected], ROUGE and METEOR score in table 7, 8, 9?