multimodal-maestro icon indicating copy to clipboard operation
multimodal-maestro copied to clipboard

issue with masks_to_marks mapping

Open dokooh opened this issue 7 months ago • 9 comments

Search before asking

  • [X] I have searched the Multimodal Maestro issues and found no similar bug report.

Bug

Hi,

First and foremost thanks for your nice work so far.

I was testing your code with your google collab tutorial, and the mark creation (SAM), visualization and refining goes smoothly. Also the prompt call with marks to gpt4 goes well without any issue and I get response back.

In the part that you try to extract and visualize relevant marks, the resultset of masks_to_marks throws the error shown below.

With the example I used I expect a large output (20-30 marks), if this helps.

Environment

0.1.0rc1 Google collab (T4 vm)

Minimal Reproducible Example

masks = maestro.extract_relevant_masks(text=response, detections=refined_marks)
masks = np.array([mask for mask in masks.values()])
detections = maestro.masks_to_marks(masks)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-61-a9e5dd9e84f7>](https://localhost:8080/#) in <cell line: 3>()
      1 masks = maestro.extract_relevant_masks(text=response, detections=marked_image)
      2 masks = np.array([mask for mask in masks.values()])
----> 3 detections = maestro.masks_to_marks(masks)

3 frames
[/usr/local/lib/python3.10/dist-packages/supervision/detection/core.py](https://localhost:8080/#) in _validate_mask(mask, n)
     27     )
     28     if not is_valid:
---> 29         raise ValueError("mask must be 3d np.ndarray with (n, H, W) shape")
     30 
     31 

ValueError: mask must be 3d np.ndarray with (n, H, W) shape

Additional

No response

Are you willing to submit a PR?

  • [ ] Yes I'd like to help by submitting a PR!

dokooh avatar Dec 04 '23 09:12 dokooh