self-operating-computer icon indicating copy to clipboard operation
self-operating-computer copied to clipboard

Proposal: Transitioning from Chrome-Exclusive to Universal Browser Compatibility

Open centopw opened this issue 1 year ago • 15 comments

Problem

Currently, the application is prompt to use Google Chrome by default, limiting accessibility and user experience for individuals using alternative browsers. This monolithic approach excludes a significant user base and hinders the platform's adaptability to diverse browser environments.

Proposal

This issue advocates for a transition from Chrome-centric development to a more inclusive approach that supports a broader range of web browsers. The goal is to enhance accessibility, improve user experience, and adhere to web standards that promote compatibility across different platforms.

Proposed Changes

When testing I realize that on MacOS you can open your default browser by just type in the search bar

browser

So instead of Google Chrome you can search browsers then enter it will open the browser without the need of user have to use Google Chrome. Since most browser have the search bar at the same location you can still use the default setting for it.

centopw avatar Dec 02 '23 11:12 centopw

@centopw Thanks for this proposed change. It's interesting to see that you can just open the default browser by searching for "browser" in Mac OS. Do you have any ideas on how the default browser could be opened on Windows and Linux? I've tested just searching for "browser" on my Linux distro and it doesn't find the default.

michaelhhogue avatar Dec 02 '23 14:12 michaelhhogue

Issue Description

When searching for browsers on different Linux distros, the current behavior is as follows:

Ubuntu 22.04.3

  • Returns all available browsers but fails to display the correct default browser.

    Ubuntu Screenshot

Kali Linux 2023.3

  • Similar to Ubuntu, it shows all available browsers but does not identify the default browser correctly.

    Kali Linux Screenshot

Proposed Changes

Two potential solutions have been considered:

  1. Script Improvement (PR #19): Enhance the existing scripts to prompt the user for their default browser choice and update the main.py with the selected browser.

  2. Update main.py: Modify main.py to prompt the user to select the default browser every time it runs.

Pros & Cons

Both options offer improved accuracy:

  • The user can specify the location of the search bar for each browser, expanding support for future browsers.

Drawbacks:

  1. Option 1:

    • Pros: Users can set their preferred default browser with the updated scripts.
    • Cons: Users must run the additional script (#19) for installation; otherwise, it defaults to Google Chrome.
  2. Option 2:

    • Pros: User flexibility in selecting the default browser each time.
    • Cons: Users are required to input their default browser choice with every run.

centopw avatar Dec 02 '23 15:12 centopw

With this proposal I have draft a simple update for the main.py as below:

 # Ask the user for their default browser
    default_browser = prompt(
        "Please enter your default browser (e.g., Chrome, Firefox): "
    )

    # Adjust the behavior based on the user's default browser
    if default_browser.lower() == "chrome":
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}
    elif default_browser.lower() == "firefox":
        browser_prompt = "Mozilla Firefox"
        browser_address_bar = {"x": "50%", "y": "10%"}
    else:
        # Default to Chrome behavior if the input is unknown
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}

    message_dialog(
        title="Self-Operating Computer",
        text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.",
        style=style,
    ).run()

    print("SYSTEM", platform.system())

    # Update the prompts based on the chosen/default browser
    VISION_PROMPT = f"""
    You are a Self-Operating Computer. You use {browser_prompt} as your default browser.

    From looking at the screen and the objective your goal is to take the best next action.

    To operate the computer you have the four options below.

    1. CLICK - Move mouse and click
    2. TYPE - Type on the keyboard
    3. SEARCH - Search for a program on {browser_prompt} and open it
    4. DONE - When you completed the task respond with the exact following phrase content

    Here are the response formats below.

    1. CLICK
    Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }}

    2. TYPE
    Response: TYPE "value you want to type"

    2. SEARCH
    Response: SEARCH "app you want to search for on {browser_prompt}"

    3. DONE
    Response: DONE

    Here are examples of how to respond.
    ...
    """

centopw avatar Dec 02 '23 15:12 centopw

Also Instead of asking user to type out we can incorporate a menu function that allow user to select a pre-define selection of browser

centopw avatar Dec 02 '23 15:12 centopw

@centopw Interesting. I think the ideal solution would be to just automatically detect the default browser if possible. On Windows, I'm pretty sure this can just be read from the registry using OpenKey. For Linux, this would probably be found in xdg-settings. I'm not sure about Mac OS. It would probably require some special permissions to access that system setting. If no default browser was found, it could just default to searching for "browser" or something. What do you think about this approach?

michaelhhogue avatar Dec 02 '23 15:12 michaelhhogue

If you want to go with terminal approach we could simply open any website then from the terminal ex:

  • Linux: xdg-open http://www.google.com
  • Windows: start http://www.google.com
  • MacOS: open http://www.google.com

When run this command in the terminal it will automatically open with default browser on each system. One more thing that I think we could benefit from this is since it always open the google.com website so we can define where the search location is avoid miss click even more

Screenshot 2023-12-02 at 5 08 57 PM

centopw avatar Dec 02 '23 16:12 centopw

@centopw That's an interesting approach. However, the project is aiming more towards only giving the model control over the OS via mouse movements, mouse clicks, key-presses, and search operations (from key-presses). Running xdg-open, start, or open from the code itself would violate that vision (restricting the model to only have the same inputs to the OS as a human: mouse and keyboard).

So, having the model open a terminal and run xdg-open using only the cursor and key-presses would be a valid operation (although not very practical). Running xdg-open from the python code itself wouldn't be valid. Hope that makes sense.

The program should probably follow this order of operations:

Get name of the user's default browser (either manually or automatically) -> Give default browser name to model in prompt -> Model references default browser name to be included in the search action.

michaelhhogue avatar Dec 02 '23 16:12 michaelhhogue

@centopw I am going to try out your install script in #19 and see how it works.

michaelhhogue avatar Dec 02 '23 16:12 michaelhhogue

@michaelhhogue Then how about this? I don't really work with Windows that much so this draft only work with Mac using webbrowser and Linux xdg-setting,

def get_default_browser_macos():
        return webbrowser.get().name

def get_default_browser_linux():
        result = subprocess.run(["xdg-settings", "get", "default-web-browser"], stdout=subprocess.PIPE, text=True)
        browser_name = result.stdout.strip()
        return browser_name


centopw avatar Dec 02 '23 16:12 centopw

@centopw I'll test this out as well and get back with you.

michaelhhogue avatar Dec 02 '23 16:12 michaelhhogue

What if browser is already open?

Kreijstal avatar Dec 03 '23 22:12 Kreijstal

@Kreijstal For now I don't think if the browser open effect anything. But that is an interesting ideas I will play around with it and let you know.

centopw avatar Dec 04 '23 13:12 centopw

@centopw Just noting here that I haven't yet tested any default browser checking. Want to first see what happens with #19.

michaelhhogue avatar Dec 04 '23 13:12 michaelhhogue

Problem

Currently, the application is prompt to use Google Chrome by default, limiting accessibility and user experience for individuals using alternative browsers. This monolithic approach excludes a significant user base and hinders the platform's adaptability to diverse browser environments.

Proposal

This issue advocates for a transition from Chrome-centric development to a more inclusive approach that supports a broader range of web browsers. The goal is to enhance accessibility, improve user experience, and adhere to web standards that promote compatibility across different platforms.

Proposed Changes

When testing I realize that on MacOS you can open your default browser by just type in the search bar

browser

So instead of Google Chrome you can search browsers then enter it will open the browser without the need of user have to use Google Chrome. Since most browser have the search bar at the same location you can still use the default setting for it.

I originally hacked in Google Chrome as the default, but agree we've out grown this. Chrome is 70% of the market if I understand correctly though. Would it make sense to "check for chrome" and if it doesn't find it then search for "browser" as shown above?

- Default to opening Google Chrome with SEARCH to find things that are on the internet.

joshbickett avatar Dec 08 '23 19:12 joshbickett

With this proposal I have draft a simple update for the main.py as below:

 # Ask the user for their default browser
    default_browser = prompt(
        "Please enter your default browser (e.g., Chrome, Firefox): "
    )

    # Adjust the behavior based on the user's default browser
    if default_browser.lower() == "chrome":
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}
    elif default_browser.lower() == "firefox":
        browser_prompt = "Mozilla Firefox"
        browser_address_bar = {"x": "50%", "y": "10%"}
    else:
        # Default to Chrome behavior if the input is unknown
        browser_prompt = "Google Chrome"
        browser_address_bar = {"x": "50%", "y": "9%"}

    message_dialog(
        title="Self-Operating Computer",
        text=f"Ask a computer to do anything. Default browser set to {browser_prompt}.",
        style=style,
    ).run()

    print("SYSTEM", platform.system())

    # Update the prompts based on the chosen/default browser
    VISION_PROMPT = f"""
    You are a Self-Operating Computer. You use {browser_prompt} as your default browser.

    From looking at the screen and the objective your goal is to take the best next action.

    To operate the computer you have the four options below.

    1. CLICK - Move mouse and click
    2. TYPE - Type on the keyboard
    3. SEARCH - Search for a program on {browser_prompt} and open it
    4. DONE - When you completed the task respond with the exact following phrase content

    Here are the response formats below.

    1. CLICK
    Response: CLICK {{ "x": "percent", "y": "percent", "description": "~description here~", "reason": "~reason here~" }}

    2. TYPE
    Response: TYPE "value you want to type"

    2. SEARCH
    Response: SEARCH "app you want to search for on {browser_prompt}"

    3. DONE
    Response: DONE

    Here are examples of how to respond.
    ...
    """

I lean away from asking the user additional questions if possible, but curious what the community thinks

joshbickett avatar Dec 08 '23 19:12 joshbickett