Fabric icon indicating copy to clipboard operation
Fabric copied to clipboard

[Bug]: Custom user prompt is not recognized with attachment

Open ruihangdu opened this issue 11 months ago • 3 comments

What happened?

▶ fabric --model google/gemini-flash-1.5-8b -a "$HOME/Downloads/whisper_transcription.m4a" "Please transcribe the attached audio file into an SRT file" --dry-run
Dry run: Would send the following request:
User:


Options:
Model: google/gemini-flash-1.5-8b
Temperature: 0.700000
TopP: 0.900000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000
empty response

I expected the User prompt to be "Please transcribe the attached audio file into an SRT file", but seems it's empty.

To compare, this is what the output is when I ran without the attachment flag:

▶ fabric --model google/gemini-flash-1.5-8b "Please transcribe the attached audio file into an SRT file" --dry-run
Dry run: Would send the following request:
User:
Please transcribe the attached audio file into an SRT file

Options:
Model: google/gemini-flash-1.5-8b
Temperature: 0.700000
TopP: 0.900000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000
empty response

Is this a bug?

Version check

  • [X] Yes I was.

Relevant log output

No response

Relevant screenshots (optional)

No response

ruihangdu avatar Dec 24 '24 19:12 ruihangdu

Can you change the order of the options, I'm curious to learn if that makes a difference?

mattjoyce avatar Dec 24 '24 20:12 mattjoyce

Tried two other permutations and neither worked

▶ fabric --model google/gemini-flash-1.5-8b "Please transcribe the attached audio file into an SRT file" -a "$HOME/Downloads/whisper_transcription.m4a" --dry-run
Dry run: Would send the following request:
User:


Options:
Model: google/gemini-flash-1.5-8b
Temperature: 0.700000
TopP: 0.900000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000

~
▶ fabric "Please transcribe the attached audio file into an SRT file" --model google/gemini-flash-1.5-8b -a "$HOME/Downloads/whisper_transcription.m4a" --dry-run
Dry run: Would send the following request:
User:


Options:
Model: google/gemini-flash-1.5-8b
Temperature: 0.700000
TopP: 0.900000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000
empty response

ruihangdu avatar Dec 26 '24 06:12 ruihangdu

can you try the following

"Please transcribe the attached audio file into an SRT file" | fabric --model google/gemini-flash-1.5-8b -a "$HOME/Downloads/whisper_transcription.m4a" --dry-run

Also, what versions is fabric? fabric --version

mattjoyce avatar Dec 29 '24 01:12 mattjoyce

I may be experiencing a similar issue for describing the contents of an image and wonder if there is a way to debug around this as I see examples of working attachments of images. I'm not sure if I'm hitting issues specific to an AI provider (Ollama running locally) or ChatGPT (current out of quota) and Perplexity.

If you know of any resources or can point me in the right direction to figure out why these examples fail I'd appreciate it (I'm assuming maybe an install issue I had on a MAC).

Ollama:

> echo "Describe the items in this image" | fabric -sp raw_query -m llama3.2-vision:latest -a https://files.readme.io/5218773ad836fa7d8e95548f49c6718fe345dda95e7e01c8c806ecf02323a831-jumpclient_diagram-2023.png

I'm happy to help! However, I don't see a specific question or topic you'd like to discuss. Could you please provide more context or clarify what you're looking for? I'll do my best to assist you!%           

Perplexity:

> echo "Describe the items in this image" | fabric -m sonar-pro -a https://files.readme.io/5218773ad836fa7d8e95548f49c6718fe345dda95e7e01c8c806ecf02323a831-jumpclient_diagram-2023.png     

perplexity API request failed: Message content was empty

kirkhw avatar Aug 27 '25 01:08 kirkhw

@kirkhw what version of fabric are you running? How did you install it? What OS?

ksylvan avatar Aug 27 '25 11:08 ksylvan

@ksylvan thank-you for responding. I'm running fabric v1.4.274 on MACOS Sequoia 15.5 on an Apple M3 chip. I performed the Package Manager install with brew install fabric-ai (as I did not have Go installed at that time) and created the alias alias fabric='fabric-ai'.

I only yesterday thought perhaps I should install brew install go to see if I'm missing some functionality with the following environment variables set. There was no change.

# Golang environment variables
export GOROOT=$(brew --prefix go)/libexec
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$HOME/.local/bin:$PATH

My next step was to:

  1. find a good example of an ollama api call passing the raw image content and
  2. enable debug in fabric to uncover what is being sent to ollama.

kirkhw avatar Aug 27 '25 14:08 kirkhw

Since you're on macOS you can also easily install the zsh completions.

You can "brew update && brew upgrade" to get the latest updates easily.

I'll look into the issue soon and see if I can reproduce it.

ksylvan avatar Aug 27 '25 14:08 ksylvan

Thanks @ksylvan . I performed the update and upgrade command and am now on version v1.4.297

I attempted the following command with trace level debug and dry run results below. Could it be I'm actually dealing with a problem on patterns vs image attachment?

> echo "Describe to me the objects in this image." | fabric -p raw_query -m llama3.2-vision:latest -a "/Users/MYUSER/Downloads/IMG_5113.jpg" --debug=3 --dry-run 


DEBUG: Mapped long flag pattern to yaml tag pattern
DEBUG: Mapped short flag p to yaml tag pattern
DEBUG: Mapped long flag temperature to yaml tag temperature
DEBUG: Mapped short flag t to yaml tag temperature
DEBUG: Mapped long flag topp to yaml tag topp
DEBUG: Mapped short flag T to yaml tag topp
DEBUG: Mapped long flag stream to yaml tag stream
DEBUG: Mapped short flag s to yaml tag stream
DEBUG: Mapped long flag presencepenalty to yaml tag presencepenalty
DEBUG: Mapped short flag P to yaml tag presencepenalty
DEBUG: Mapped long flag raw to yaml tag raw
DEBUG: Mapped short flag r to yaml tag raw
DEBUG: Mapped long flag frequencypenalty to yaml tag frequencypenalty
DEBUG: Mapped short flag F to yaml tag frequencypenalty
DEBUG: Mapped long flag model to yaml tag model
DEBUG: Mapped short flag m to yaml tag model
DEBUG: Mapped long flag vendor to yaml tag vendor
DEBUG: Mapped short flag V to yaml tag vendor
DEBUG: Mapped long flag modelContextLength to yaml tag modelContextLength
DEBUG: Mapped long flag yt-dlp-args to yaml tag ytDlpArgs
DEBUG: Mapped long flag seed to yaml tag seed
DEBUG: Mapped short flag e to yaml tag seed
DEBUG: Mapped long flag suppress-think to yaml tag suppressThink
DEBUG: Mapped long flag think-start-tag to yaml tag thinkStartTag
DEBUG: Mapped long flag think-end-tag to yaml tag thinkEndTag
DEBUG: Mapped long flag disable-responses-api to yaml tag disableResponsesAPI
DEBUG: Mapped long flag transcribe-file to yaml tag transcribeFile
DEBUG: Mapped long flag transcribe-model to yaml tag transcribeModel
DEBUG: Mapped long flag split-media-file to yaml tag splitMediaFile
DEBUG: Mapped long flag voice to yaml tag voice
DEBUG: Mapped long flag notification to yaml tag notification
DEBUG: Mapped long flag notification-command to yaml tag notificationCommand
DEBUG: Mapped long flag thinking to yaml tag thinking
DEBUG: CLI flag used: p (yaml: pattern)
DEBUG: CLI flag used: m (yaml: model)
DEBUG: Starting template processing
DEBUG: Starting template processing
DEBUG: Template processing complete
Dry run: Would send the following request:

System:
# IDENTITY

You are a universal AI that yields the best possible result given the input.

# GOAL

- Fully digest the input.

- Deeply contemplate the input and what it means and what the sender likely wanted you to do with it.

# OUTPUT

- Output the best possible output based on your understanding of what was likely wanted.

user:
  - Type: text
    Text: Describe to me the objects in this image.
  - Type: image_url
    Image URL: data:image/png;base64,iVBORw0KGgoAAAANSUhE***SHORTENED****gg==

Options:
Model: llama3.2-vision:latest
Temperature: 0.700000
TopP: 0.900000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000

Dry run: Fake response sent by DryRun plugin

Actual run results:

I'm happy to help! However, I don't see a specific question or topic you'd like to discuss. Could you please provide more context or clarify what you're looking for assistance with? I'll do my best to provide a helpful response.

kirkhw avatar Aug 27 '25 15:08 kirkhw

@ksylvan Should we continue to work under this issue? I have the following Ollama trace log entry for the message so we can compare between a successful API call I make via a CURL call and one via Fabric. My question never makes it to Ollama in the Fabric case.

Question: What is in this picture?

CURL/Postman (SUCCESS):

time=2025-08-29T13:14:17.759-04:00 level=DEBUG source=server.go:729 msg="completion request" images=1 prompt=48 format=""
time=2025-08-29T13:14:17.759-04:00 level=TRACE source=server.go:730 msg="completion request" prompt="[INST] [img-0]\n\nWhat is in this picture? [/INST]"

Fabric call (FAILURE): echo "What is in this picture?" | fabric -m llama3.2-vision:latest -a /Users/MYUSER/Downloads/IMG_5113.jpg

time=2025-08-29T13:21:59.169-04:00 level=DEBUG source=server.go:729 msg="completion request" images=0 prompt=99 format=""
time=2025-08-29T13:21:59.169-04:00 level=TRACE source=server.go:730 msg="completion request" prompt="<|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
time=2025-08-29T13:21:59.170-04:00 level=TRACE source=bytepairencoding.go:205 msg=encoded string="<|start_header_id|>user<|end_header_id|>\n\n<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" ids="[128006 882 128007 271 128009 128006 78191 128007 271]"

kirkhw avatar Aug 29 '25 17:08 kirkhw

Hmmm... Thanks @kirkhw - That's very useful.

I think it's okay to kep collaborating here, yes.

ksylvan avatar Aug 29 '25 23:08 ksylvan

On the other hand, @kirkhw maybe this is more appropriate for an Ollama-focused issue.

@ruihangdu for the original issue, the new code handles attachments properly:

kayvan@dharma ~/zoom $ echo 'THIS IS NOT A VALID M4A FILE' > test.m4a
kayvan@dharma ~/zoom $ fabric --model google/gemini-flash-1.5-8b -a ./test.m4a "Please transcribe the attached audio file into an SRT file" --dry-run 
Dry run: Would send the following request:

System:


IMPORTANT: First, execute the instructions provided in this prompt using the user's input. Second, ensure your entire final response, including any section headers or titles generated as part of executing the instructions, is written ONLY in the en-US language.

user:
  - Type: text
    Text: Please transcribe the attached audio file into an SRT file
  - Type: image_url
    Image URL: data:text/plain; charset=utf-8;base64,VEhJUyBJUyBOT1QgQSBWQUxJRCBNNEEgRklMRQo=

Options:
Model: google/gemini-flash-1.5-8b
Temperature: 0.000000
TopP: 0.000000
PresencePenalty: 0.000000
FrequencyPenalty: 0.000000

Dry run: Fake response sent by DryRun plugin

ksylvan avatar Aug 29 '25 23:08 ksylvan

@kirkhw Can you create a BUG issue with the details that you've found, including the SUCCESS and FAILURE cases you put up above?

ksylvan avatar Aug 29 '25 23:08 ksylvan

Closing the original issue, as attachments to the models that support it work correctly:

kayvan@dharma ~/zoom $ fabric --model google/gemini-flash-1.5 -a ~/Downloads/000-P8XQELOuxrE.jpeg "Please describe this image."
                                
The image is a close-up portrait of a middle-aged woman with shoulder-length dark brown, wavy hair. She is smiling warmly at the camera.  Her skin tone is medium, and she has brown eyes. She's wearing a simple, dark-colored, textured sweater and small, round, light-colored earrings. The background is blurred, showing warm-toned out-of-focus lights, suggesting an indoor setting, possibly a restaurant or office. The overall impression is one of approachability and friendliness.

ksylvan avatar Aug 29 '25 23:08 ksylvan