nut.js icon indicating copy to clipboard operation
nut.js copied to clipboard

OCR is Inconsistent in Windows Context Menus

Open joel-duffie opened this issue 1 year ago • 1 comments

Short summary I want to use OCR successfully and consistently in the context menus of my application.

Desired execution environment / tested on Microsoft Windows

  • [ ] Virtual machine
  • [ ] Docker container
  • [x] Dev/Host system

node version: 16.20.2

OS type and version: Windows 10 Home v.22H2

Full code sample related to question

// get menubar region
let activeWin = await nut.getActiveWindow();
pro_reg = await activeWin.region;
await nut.screen.highlight(pro_reg);
let menubar_reg = new nut.Region(pro_reg.left + 8, pro_reg.top + 8, 500, 30)
await nut.screen.highlight(menubar_reg);

// get menu bar item
let menuBarItem = await nut.screen.find(nut.singleWord(itemName), {
    confidence: 0.40,
    searchRegion: menubar_reg
});
// click menubar item
await nut.mouse.move(nut.centerOf(menuBarItem));
await nut.mouse.click(nut.Button.LEFT);
await nut.sleep(100);

// make menu region
let menu_reg = new nut.Region(menuBarItem.left + 25, (menuBarItem.top + menuBarItem.height + 5), 125, 470);
await nut.screen.captureRegion('just-region-cap', menu_reg, nut.FileType.PNG, './test-imagery/');
await nut.screen.highlight(menu_reg);
await nut.sleep(2000);

// find menu item (ocr)
let menuitem = await nut.screen.find(nut.textLine('Presentation Editor'), {
    confidence: 0.50,
    searchRegion: menu_reg,
    providerData: {
        preprocessConfig: {
            binarize: false
        }
    }
});
await nut.mouse.move(nut.centerOf(menuitem));

Detailed question In another thread, this issue was discussed, and some solutions were proposed. Essentially, this is what I'm trying to do with this code.

  1. Get the region of my application.
  2. Locate the app menubar and select the item 'view'
  3. Manually constrain a search region around the text of the menu
    • getActiveWindow() doesn't work for menus on Windows
  4. Search in said region for "Presentation Editor"
    • disable preprocessing for easier locating

This is the search region I narrowed it down to: just-region-cap

Many of the above menu items do work. ('Show', 'Action Palette', etc.)

However, this one, and possibly others, do not. So, my question is this: is there anything else I can try in order to make consistent OCR text matching a reality for my application?

joel-duffie avatar Jan 30 '24 15:01 joel-duffie

Hi @joel-duffie 👋

I’ll do another iteration on preprocessing to improve OCR results. Alternatively, have you tried the Azure OCR plugin?

s1hofmann avatar Feb 01 '24 09:02 s1hofmann

@joel-duffie Since the Azure OCR plugin seems to be working well, may I close this issue?

s1hofmann avatar May 14 '24 15:05 s1hofmann

@s1hofmann that's fine with me.

joel-duffie avatar May 14 '24 18:05 joel-duffie