LL3DA
LL3DA copied to clipboard
New object
Excuse me, I have a question, is the prediction of the box(ov-det) invalid for the objects other than the 17 objects defined? Because I found that objects will be filtered here(captioner.py-line394), if there are new categories, will they all be judged as others, so that it is impossible to predict?
If there's a new object, a fruit, how to predict its position?
The last class of sem_cls_logits
is the no object
class. Open-vocabulary detection is designed to extend a model’s ability to localize and recognize object beyond a close and pre-defined category set.
The last class of
sem_cls_logits
is theno object
class. Open-vocabulary detection is designed to extend a model’s ability to localize and recognize object beyond a close and pre-defined category set.
Is it possible to locate objects in a pre-defined set of classes in ov-det so that the description is generated for these objects only? Wouldn't it be possible to locate a new object without generating a related object description? If retrain detection, can increase the object category? Such as adding a fruit category
You can just filter and re-label the categories in the generated texts.
You can just filter and re-label the categories in the generated texts.
I'm sorry, I didn't understand what you said. For example, if I want to detect the location of a banana in a new scene, can it output the location of a banana like the one in the template?
So when I filter, it will think of the bananas as others categories. If I re-label the categories, such as 18:banana, would it be right?
change self.num_semcls=19?
If you are looking for a grounding model, you can design input text instructions like “locate the banana”.
If you are looking for a grounding model, you can design input text instructions like “locate the banana”.
The generated answer gives the center of the box and the length, width and height, so how do I visualize the box?
How can I reconstructed the 3D box?
Please refer to https://github.com/ch3cook-fdu/3d-pc-box-viz for more visualization functions
Please refer to https://github.com/ch3cook-fdu/3d-pc-box-viz for more visualization functions
It's good to see the results of your work. Can you explain in detail how to decode 3D box? I try to decode it, but failed. Looking forward to your reply.
The code for decoding box coordinates can be found in https://github.com/Open3DA/LL3DA/blob/main/eval_utils/evaluate_ovdet.py#L163-L201. Please refer to https://github.com/ch3cook-fdu/Vote2Cap-DETR/issues/11 for visualization.