visp icon indicating copy to clipboard operation
visp copied to clipboard

The pose of the automatic datageneration pipeline from blenderproc seems wrong

Open ArghyaChatterjee opened this issue 1 year ago • 8 comments

Hello,

I was trying your script generate_dataset.py to get object labels but the pose seems wrong. When I extracted the pose information from the hdf5, they don't seem to match visually with the camera image. Also, when there are 2 or more objects, the script is just copying the pose of same object for all of them.

Here are some of the examples:

0_0

[
   {
       "label": 1,
       "bounding_box": [
           528.3033683747154,
           350.111358523303,
           86.45304245455236,
           71.3704478882533
       ],
       "position": [
           -0.193613428473197,
           -0.052296279258011724,
           -0.4074290554587555
       ],
       "orientation": [
           0.5622279602809314,
           0.13505489272540747,
           -0.36329581345026724,
           0.7305313467324867
       ]
   }
]

0_1

[
    {
        "label": 1,
        "bounding_box": [
            531.3388914344279,
            272.5900441560709,
            252.6906517004163,
            238.51034663058556
        ],
        "position": [
            0.1431660092025578,
            -0.4546434238147105,
            0.09740131947546071
        ],
        "orientation": [
            -0.05251490352928403,
            -0.6426946663379124,
            0.7524901406111899,
            -0.13395648192999454
        ]
    }
]

0_2

[
    {
        "label": 1,
        "bounding_box": [
            730.3734709220116,
            268.085564880805,
            144.53625670904307,
            127.3197449537106
        ],
        "position": [
            0.027961062796976854,
            -0.3261060630684295,
            -0.5597348095212347
        ],
        "orientation": [
            0.012805106757996057,
            -0.33755419184921287,
            0.7005704945872097,
            0.6285651747589692
        ]
    },
    {
        "label": 1,
        "bounding_box": [
            555.8597708120552,
            293.03720094006644,
            69.6788034281102,
            64.1948683022967
        ],
        "position": [
            0.027961062796976854,
            -0.3261060630684295,
            -0.5597348095212347
        ],
        "orientation": [
            0.012805106757996057,
            -0.33755419184921287,
            0.7005704945872097,
            0.6285651747589692
        ]
    }
]

0_3

[
    {
        "label": 1,
        "bounding_box": [
            540.2956812767641,
            308.14795013300306,
            94.39906679984529,
            81.73959076527353
        ],
        "position": [
            0.2766396533819444,
            -0.11347862847977491,
            -0.5087633642552065
        ],
        "orientation": [
            0.33919253952584655,
            0.6655386080760665,
            -0.5156966917404101,
            0.41959945712330887
        ]
    },
    {
        "label": 1,
        "bounding_box": [
            822.3425366613939,
            0.0,
            112.54641103476649,
            84.1987549308912
        ],
        "position": [
            0.2766396533819444,
            -0.11347862847977491,
            -0.5087633642552065
        ],
        "orientation": [
            0.33919253952584655,
            0.6655386080760665,
            -0.5156966917404101,
            0.41959945712330887
        ]
    }
]

0_4

[
    {
        "label": 1,
        "bounding_box": [
            527.3310379686568,
            373.4647057753818,
            149.43002948172102,
            145.06583052221805
        ],
        "position": [
            0.028645462561879054,
            -0.485550404620021,
            -0.04302564275903398
        ],
        "orientation": [
            -0.007617044123544663,
            -0.4140594877416287,
            0.7246017694630462,
            0.5508620489205827
        ]
    }
]

0_5

[
    {
        "label": 1,
        "bounding_box": [
            605.7788841147415,
            274.39278030464175,
            105.53297545444514,
            88.75992049067071
        ],
        "position": [
            -0.03505483908816753,
            0.24564300443680498,
            -0.3837801309847557
        ],
        "orientation": [
            0.16523323474034954,
            0.21681893600297233,
            -0.28533776915120673,
            0.9188415993105624
        ]
    }
]

Though I added a post processing code block to convert the cTo to position and orientation (in quaternion) during export to json but that should not hurt the (4x4) homogeneous transformation matrix giving the last column as position (x, y, z) in meters.

import os
from pathlib import Path
import h5py
from scipy.spatial.transform import Rotation as R
import json
import numpy as np

def convert_cTo_to_pose(cTo):
    """Convert cTo matrix to position and quaternion orientation."""
    rotation_matrix = cTo[:3, :3]
    translation_vector = cTo[:3, 3]
    r = R.from_matrix(rotation_matrix)
    quaternion = r.as_quat()  # In the form of [x, y, z, w]
    return translation_vector.tolist(), quaternion.tolist()

def process_hdf5_files_to_json(folder_path):
    """Process all hdf5 files in the specified folder to extract object poses and save as JSON."""
    hdf5_folder = Path(folder_path)
    output_folder = hdf5_folder / "pose_json"
    output_folder.mkdir(exist_ok=True)

    for hdf5_file in hdf5_folder.glob("*.hdf5"):
        try:
            with h5py.File(hdf5_file, 'r') as f:
                # Attempt to access 'object_data'. If it fails, KeyError is raised and caught.
                object_data_serialized = f['object_data'][()]
                object_data = json.loads(object_data_serialized.decode('utf-8'))
                pose_data = []
                for obj in object_data:
                    if 'cTo' in obj:
                        cTo_matrix = np.array(obj['cTo'])
                        position, quaternion = convert_cTo_to_pose(cTo_matrix)
                        pose_info = {
                            "label": obj.get('class', 'unknown'),  # Assuming 'class' is the object label
                            "bounding_box": obj.get('bounding_box', []),  # Assuming this format for bounding box
                            "position": position,
                            "orientation": quaternion
                        }
                        pose_data.append(pose_info)
                pose_filename = output_folder / (hdf5_file.stem + "_pose.json")
                with open(pose_filename, 'w') as pose_file:
                    json.dump(pose_data, pose_file, indent=4)
        except KeyError as e:
            print(f"Warning: {hdf5_file.name} does not contain 'object_data'. Skipping.")

# Example usage
folder_path = 'path to folder'
process_hdf5_files_to_json(folder_path)

print("Processing complete. Pose JSON files have been generated in the 'pose_json' subfolder.")

ArghyaChatterjee avatar Feb 19 '24 18:02 ArghyaChatterjee

Hi @ArghyaChatterjee,

I believe I know where the issue lies concerning the wrong pose frame. I'll investigate further to see what's the problem with the pose repeating and propose a fix soon.

Cheers for the report, Sam

SamFlt avatar Feb 20 '24 12:02 SamFlt

Hi,

The issue should be fixed! You will however have to regenerate your data with the generate_dataset script

Please tell me if you have any issue.

Thanks

SamFlt avatar Feb 20 '24 15:02 SamFlt

Have you pushed your update in the repo ?? I don't see any changes in the repo.

ArghyaChatterjee avatar Feb 20 '24 18:02 ArghyaChatterjee

Also, another question. Are you sampling the camera pose or object pose ?? I actually saw this one: vpMath_regular_points_on_sphere

Also the animation in this readme file looks like you are doing more of camera sampling than object pose sampling. Is that true ?? https://github.com/lagadic/visp/tree/master/script

I wanted the camera to fix in the world and rotate the object in a certain axis ( say we don't see the back of a doorhandle). How to do that ??

ArghyaChatterjee avatar Feb 20 '24 18:02 ArghyaChatterjee

Also, Is the visp coordinate frame same as opencv frame ?

ArghyaChatterjee avatar Feb 20 '24 19:02 ArghyaChatterjee

Hi,

The changes have been done on another branch, for which there is a pull request: https://github.com/lagadic/visp/pull/1338

Also, another question. Are you sampling the camera pose or object pose ?? I actually saw this one:

The sampling you are referring to (in the readme of the script directory) is for another script. For the dataset generation script: it's a mix of both: we sample random object poses (with or without physics simulation) in a scene, then sample a set of camera poses looking at certain objects. We repeat this process for multiple scenes (you can change the number of scenes, objects, etc.) in the .json file. For more info on how to use the program, see the corresponding tutorial

I wanted the camera to fix in the world and rotate the object in a certain axis ( say we don't see the back of a doorhandle). How to do that ??

This should be possible by modifying the script dataset/generate_scene.py and in particular, the function create_target_objectsof the generator class.

see the tutorials here for what you can do with Blenderproc

Also, Is the visp coordinate frame same as opencv frame ?

Yes.

Sam

SamFlt avatar Feb 20 '24 22:02 SamFlt

For some reason, each 10th image generated in 1 folder (there are folders like 0, 1, 2, 3, ... each containing 0 to 10 images) is missing the target object. Is there any particular reason for that ?? Is there a way to get rid of that ?? I am getting an error sign for each iteration.

ArghyaChatterjee avatar Feb 21 '24 16:02 ArghyaChatterjee

Yes this is normal. You can disable this behaviour by setting empty_images_per_scene to 0 in the .json configuration file.

SamFlt avatar Feb 21 '24 16:02 SamFlt