cmc-csci143 icon indicating copy to clipboard operation
cmc-csci143 copied to clipboard

Good LLM workflows

Open mikeizbicki opened this issue 9 months ago • 3 comments

In class today, we discussed some good LLM workflows. This github issue will serve as a repository of good/useful LLM workflows for our pagila problem set. In order to encourage you all to experiment with these workflows, I will offer 1-2 points of extra credit for anyone who suggests a particularly useful workflow.


Recall that my bashrc contains the following lines for setting up the claude and groq commands:

function llm_blue() {
    printf "\033[94m"
    command llm "$@"
    printf "\033[0m"
}
alias groq='llm_blue -s "keep your response short, between 5-20 lines" -m groq/llama-3.3-70b-specdec'
alias claude='llm_blue -s "keep your response short, between 5-20 lines" -m claude-3-5-sonnet-latest'

Notice in particular that I have updated my groq command to use the "speculative decoding" model, which is 10x faster than the already fast groq models, and that I also have an alias for claude. (Using claude will require an API key from anthropic and IIRC a $5 minimum payment.)


Two good example workflows that we discussed in class are shown below.

To attempt to one-shot solve a problem:

$ claude <<EOF
Given the following schema:

$(cat pagila/pagila-schema.sql)

Answer the following SQL question:

$(cat sql/01.sql)
EOF

If the prompt above fails due to formatting issues, you can provide feedback about how to fix the formatting with a prompt like:

$ claude <<EOF
Given the following schema:

$(cat pagila/pagila-schema.sql)

Answer the following SQL question:

$(cat sql/18.sql)

The expected output should be in the format of:

$(head expected/18.out)
EOF

To claim your extra credit, simply reply to this post with a prompt structure and a short explanation on a suggested use case.

mikeizbicki avatar Mar 06 '25 19:03 mikeizbicki

I am attaching the prompt I have been using to help me debug, as well as a portion of a sample output.

GPT is an alias for llm -m o1-mini and I am using -c to take advantage of the prompts suggested by Izbicki, which include the schema.

$ GPT -c <<EOF
Below is the actual output produced by running the SQL query from sql/01.sql:

$(docker compose exec -T pg psql -U postgres -f sql/01.sql)

Below is the expected output as provided in expected/01.out:

$(cat expected/01.out)

Please compare the actual output with the expected output. Identify and list all differences between them and suggest possible reasons for these differences. 

EOF
### **1. Number of Rows**

- **Actual Output:** **521 rows**
- **Expected Output:** **659 rows**

**Difference:** The actual output contains **138 fewer rows** than the expected output.

---

### **2. Columns**

- **Both Outputs:**
  - `film_id`
  - `title`

**Difference:** **No difference** in the number of columns; both outputs include the same two columns.

---

### **3. Specific Data Discrepancies**

#### **a. Missing Films in Actual Output**

Several `film_id` and `title` pairs present in the expected output are **absent** in the actual output. Here are some notable examples:

| film_id | Expected Title            | Actual Title          |
|---------|---------------------------|-----------------------|
| 6       | AGENT TRUMAN              | **Missing**           |
| 9       | ALABAMA DEVIL             | **Missing**           |
| 33      | APOLLO TEEN               | **Missing**           |
| 35      | ARACHNOPHOBIA ROLLERCOASTER | **Missing**         |
| 37      | ARIZONA BANG              | **Missing**           |
| ...     | ...                       | ...                   |
| 999     | ZOOLANDER FICTION         | **Present**           |
| 1000    | ZORRO ARK                 | **Present**           |

*Note: The table above includes only a subset of the missing films for brevity.*

#### **b. Additional Films in Actual Output**

Upon inspection, the **actual output does not contain any films** that are **not present** in the expected output. All films in the actual output appear to be either matching or subsets of the expected films, with no extras observed.

---

### **4. Potential Reasons for Differences**

The discrepancies between the actual and expected outputs can be attributed to several factors:
...

mvalsania avatar Mar 09 '25 22:03 mvalsania

And if you are extra lazy you can store the prompt in a run_debug.sh file and call it using ./run_debug.sh N where N is the number of the exercise you care about.

#!/bin/bash
# Usage: ./run_debug.sh <number>
# Example: ./run_debug.sh 10

if [ -z "$1" ]; then
  echo "Usage: $0 <number>"
  exit 1
fi

num="$1"

llm -m o1-mini -c <<EOF
Below is the actual output produced by running the SQL query from sql/${num}.sql:

$(docker compose exec -T pg psql -U postgres -f sql/${num}.sql)

Below is the expected output as provided in expected/${num}.out:

$(cat expected/${num}.out)

Please compare the actual output with the expected output. Identify and list all differences between them and suggest possible reasons for these differences.
EOF

mvalsania avatar Mar 09 '25 22:03 mvalsania

@mvalsania I like the debugging use-case and the script to automate, so I've added +2 ec in sakai :)

I'll also point out that the diff command provides similar output in a machine readable format. It is worth learning to read the diff output directly, and I suspect adding the diff output to the LLM prompt could improve the LLMs ability to give advice in weird edge case.

mikeizbicki avatar Mar 10 '25 17:03 mikeizbicki