ollama-ocr icon indicating copy to clipboard operation
ollama-ocr copied to clipboard

random switch between OCR and interpretation?

Open sfranky opened this issue 1 year ago • 2 comments

I'm under the impression that, on running the provided js code, the output is either the result of OCR, or an interpretation on what is on the provided image. Is there a way to tell it which to do specifically? e.g. by issuing node ocr.js twice I get once an interpretation, and once the OCR:

λ node ocr.js
The image displays a schedule for a Greek television channel, likely Olympic Channel Greece or another sports-focused channel. The schedule is presented in both English and Greek, indicating that it may be intended for international audiences.

**Schedule Breakdown**

*   **EPT 1**
    *   10:30-12:00 - Aerobics with Mikko Omaiko
        *   This segment appears to feature an aerobics program hosted by Mikko Omaiko.
    *   12:00-14:00 - Second episode of "The Island"
        *   The schedule lists a second episode of "The Island," but the title and genre are unclear due to the lack of context.
    *   14:00-15:10 - Tennis match between B' Group players
        *   This segment features a tennis match involving players from the B' Group, although specific details about the teams or players are not provided.
    *   15:10-16:10 - Windsurfing competition (A) round 9/10
        *   The schedule lists a windsurfing competition with an "A" designation and rounds 9 and 10. However, without further context, it is unclear what this means or how it relates to the overall program.
    *   16:10-20:00 - Tennis match between B' Group players
        *   Similar to the previous tennis segment, this one also features a match involving players from the B' Group. Again, specific details about the teams or players are not provided.

**EPT 2**

*   **09:00-11:00 - Triathlon with Andreas**
    *   This segment appears to feature a triathlon program hosted by Andreas.
*   **11:00-12:00 - Second episode of "The Island"**
    *   The schedule lists another episode of "The Island," but the title and genre remain unclear due to the lack of context.
*   **12:00-14:00 - Tennis match between B' Group players**
    *   This segment features a tennis match involving players from the B' Group, although specific details about the teams or players are not provided.
*   **14:30-16:30 - BMX Freestyle Prodigy - Zappin'**
    *   The schedule lists a BMX freestyle program called "Zappin'," but without further context, it is unclear what this means or how it relates to the overall program.
*   **17:00-18:30 - Volleyball match**
    *   This segment features a volleyball match, although specific details about the teams or players are not provided.

**EPT 3**

*   **10:30-13:05 - Figure skating (4th competition)**
    *   The schedule lists a figure skating program, but without further context, it is unclear what this means or how it relates to the overall program.
*   **13:05-14:30 - Second episode of "The Island"**
    *   Another episode of "The Island" is listed, but the title and genre remain unclear due to the lack of context.
*   **14:30-16:30 - Tennis match between B' Group players**
    *   This segment features a tennis match involving players from the B' Group, although specific details about the teams or players are not provided.
*   **16:30-17:35 - Table tennis (A) round 32/64**
    *   The schedule lists a table tennis competition with an "A" designation and rounds 32 and 64. However, without further context, it is unclear what this means or how it relates to the overall program.
*   **17:35-18:45 - Gymnastics (A) round 4**
    *   This segment features a gymnastics competition with an "A" designation and rounds 4. Again, without further context, it is unclear what this means or how it relates to the overall program.

**Additional Segments**

*   **20:30-22:00 - Tennis match between B' Group players**
    *   This segment features a tennis match involving players from the B' Group, although specific details about the teams or players are not provided.
*   **22:00-23:30 - Badminton (A) round 32/64**
    *   The schedule lists a badminton competition with an "A" designation and rounds 32 and 64. However, without further context, it is unclear what this means or how it relates to the overall program.

Overall, the image presents a diverse range of sports programs, including aerobics, tennis, windsurfing, triathlon, figure skating, table tennis, gymnastics, badminton, volleyball, and BMX freestyle. While some segments are clearly labeled with specific titles or genres, others lack context and appear to be missing key information.

T:\Projects\ollamaOCR
λ ollama ps
NAME                      ID              SIZE     PROCESSOR          UNTIL
llama3.2-vision:latest    38107a0cd119    12 GB    19%/81% CPU/GPU    2 minutes from now

T:\Projects\ollamaOCR
λ node ocr.js
**Transcription:**

TPITH 30 IOYAIQY 2024

EPT 1

*   10:30-12:00 ΣΚΟΠΟΨΗ
*   12:00-14:00 ΜΠΑΣΚΕΤ Α' ΟΜΙΛΟΣ (A) 2η ΑΓΩΝΙΣΤΙΚΗ ΙΣΠΑΝΙΑ-ΕΛΛΑΔΑ
*   14:00-15:10 ΤΕΝΙΣ Β' ΟΜΙΛΟΣ - ΓΥΡΟΣ ΓΥΝΑΙΚΩΝ
*   15:10-16:10 ΙΣΤΙΟΠΛΟΙΑ - WINDSURFING (A) R 9/10
*   16:10-20:00 ΤΕΝΙΣ - Β' ΟΜΙΛΟΣ ΑΝΔΡΩΝ - ΓΥΡΟΣ ΓΥΝΑΙΚΩΝ - ΓΥΡΟΣ ΜΙΚΤΑ

EPT 2

*   09:00-11:00 ΤΡΙΑΘΛΟ ΑΝΔΡΕΣ
*   11:00-12:00 ΠΙΝΓΚ ΠΟΝΓΚ - ΦΑΣΗ <<32>> ΑΝΔΡΩΝ
*   12:00-14:00 ΚΟΛΥΜΒΗΣΗ - 4η ΜΕΡΑ ΠΡΩΤΗ
*   14:00-17:00 ΤΟΣΩΒΟΛΙΑ - ΦΑΣΗ «32»/16» (Γ)
*   14:30-16:30 BMX FREESTYLE ΠΟΚΡΙΜΑΤΙΚΑ - ΖΑΠΙΝΓΚ
*   17:00-18:30 ΜΠΑΣΚΕΤ Α' ΟΜΙΛΟΣ - ΔΥο ΑΓΩΝΙΣΤΗΚΗ ΚΑΝΑΔΑΣ-ΑΥΣΤΡΑΛΙΑΣ
*   18:30-19:15 ΤΖΟΥΝΤΟ - 81 (A) / -63 (Γ) - TΕΛΙΚΟΙ
*   19:15-21:30 ΕΝΟΡΓΑΝΗ ΓΥΜΝΑΣΤΙΚΗ
*   21:30-23:15 ΚΟΛΥΜΒΗΣΗ - 4η ΜΕΡΑ ΒΡΑΔΥΡΟΣ
*   23:15-00:45 ΟΛΥΜΠΙΚ ΝΙΓΤΣ - ΕΚΠΟΜΠΗ (Ζ)

EPT 3

*   10:30-13:05 ΚΩΠΕΛΑΣΙΑ - 4η ΜΕΡΑ
*   13:05-14:30 ΠΟΛΟ Α' ΟΜΙΛΟΣ (A) ΚΡΟΪΑΤΙΑ - ΙΤΑΛΙΑ
*   14:30-16:30 ΜΠΑΣΚΕΤ Α' ΟΜΙΛΟΣ (A) 2η ΑΓΩΝΙΣΤΙΚΗ ΚΑΝΑΔΑΣ-ΑΥΣΤΡΑΛΙΑΣ
*   16:30-17:35 ΠΟΛΟ Β' ΟΜΙΛΟΣ (A) ΙΑΠΩΝΙΑ - ΓΑΛΛΙΑ
*   17:35-18:45 ΠΟΛΟ Α' ΟΜΙΛΟΣ (A) ΗΠΑ - ΡΟΥΜΑΝΙΑ
*   18:45-20:30 ΤΟΣΩΒΟΛΙΑ ΦΑΣΗ «32»/16» (Γ)
*   20:30-22:00 ΠΟΛΟ Α' ΟΜΙΛΟΣ (A) ΜΑΥΡΟΒΟΥΝΙΟ - ΕΛΛΑΔΑ
*   22:00-24:00 ΜΠΑΣΚΕΤ Α' ΟΜΙΛΟΣ (A) 2η ΑΓΩΝΙΣΤΙΚΗ ΒΡΑΖΙΛΙΑ-ΓΕΡΜΑΝΙΑ

EPT SPORTS 1

*   15:30-16:30 ΠΙΝΓΚ ΠΟΝΓΚ - ΦΑΣΗ ΔΙΠΛΟ ΤΕΛΙΚΟΣ
*   20:30-21:20 ΤΟΣΩΒΟΛΙΑ (Γ) ΦΑΣΗ «32»/16»
*   22:00-23:30 ΠΟΛΟ Β' ΟΜΙΛΟΣ (A) ΙΣΠΑΝΙΑ-ΟΥΓΓΑΡΙΑ

EPT SPORTS 2

*   13:00-24:00 ΤΕΝΙΣ
*   B' ΓΥΡΟΣ ΑΝΔΡΩΝ - ΓΥΡΟΣ ΓΥΝΑΙΚΩΝ - ΓΥΡΟΣ ΜΙΚΤΑ

T:\Projects\ollamaOCR
λ

The code I ran is the following:

import { ollamaOCR, LlamaOCRError, ErrorCode, DEFAULT_OCR_SYSTEM_PROMPT, DEFAULT_MARKDOWN_SYSTEM_PROMPT } from "ollama-ocr";

async function runOCR() {
  try {
    const text = await ollamaOCR({
      // model: "minicpm-v",
      filePath: "./test3.png",
    });
    console.log(text);
  } catch (error) {
    if (error instanceof LlamaOCRError) {
      switch (error.code) {
        case ErrorCode.FILE_NOT_FOUND:
          console.error("Image file not found");
          break;
        case ErrorCode.UNSUPPORTED_FILE_TYPE:
          console.error("Unsupported image format");
          break;
        case ErrorCode.OLLAMA_SERVER_ERROR:
          console.error("Ollama server connection failed");
          break;
        case ErrorCode.OCR_PROCESSING_ERROR:
          console.error("OCR processing failed");
          break;
      }
    }
  }
}
runOCR();

sfranky avatar Nov 28 '24 17:11 sfranky

Local device cannot run Llama-3.2-90B-Vision model, only Llama-3.2-11B-Vision model is tested. Sorry for the trouble, I have some suggestions for your feedback:

  1. If the local device supports Llama-3.2-90B-Vision, you can try the effect. Or use the online service provided by Together AI to test it.
  2. Clean up the returned text by removing Markdown tags and converting it to plain text. If the recognized results are used as input for LLM, it should be possible to do this without cleaning up the Markdown.
  3. Use different Prompts for different models, which can be adjusted based on the built-in Prompts.
  4. If you need to have high OCR accuracy, you can use OCR engines such as PaddleOCR or Tesseract. For macOS, you can try macos-vision-ocr library.
  5. Here are the prompts I adjusted, you can test them. After many tests, I found that the stability of minicpm-v is a little bit worse than Llama-3.2-11B-Vision.
async function runOCR() {
  const text = await ollamaOCR({
    filePath: "./trader-joes-receipt.jpg",
    systemPrompt: `Act as an OCR assistant, only OCR recognition is performed. Analyze the provided image and:
    1. Recognize all visible text in the image as accurately as possible.
    2. Maintain the original structure and formatting of the text.
    3. If any words or phrases are unclear, indicate this with [unclear] in your transcription.
    4. Provide only the transcription without any additional comments.

    Only return raw text, dont't return Markdown format.`,
  });
  console.log(text);
}

bytefer avatar Nov 29 '24 01:11 bytefer

ok thanks that solved it, i thought there was a direct way to ask for OCR, but this is fine, too.

sfranky avatar Nov 29 '24 11:11 sfranky