open interpreter crash when using computer.display.view
Describe the bug
Every time the interpreter has to pull computer.display.view it crashes
Reproduce
Last login: Mon Mar 25 14:33:59 on ttys000 joniy@Air-de-Joao ~ % interpreter --os
▌ OS Control enabled
Open my music and tell me what do we have to play there
First, I'll need to open the Music application on your computer. Let's use a
spotlight search to open the Music app and then check what is available to
play.
Let's start by pressing the "command" and spacebar keys to open Spotlight,
and then I'll type "Music" to find and open the Music application. After
that, I'll wait about 2 seconds and usecomputer.display.view to see what's
on the screen.
import time
Open Spotlight and type Music to open the app
computer.keyboard.hotkey(" ", "command")
time.sleep(0.5)
computer.keyboard.write("Music")
time.sleep(0.5)
computer.keyboard.press("enter")
time.sleep(2) # wait for the Music app to open
Capture the screen to see what's happening
computer.display.view()
[IPKernelApp] WARNING | Parent appears to have exited, shutting down. [IPKernelApp] WARNING | Parent appears to have exited, shutting down.
Python Version: 3.11.0
Pip Version: 24.0
Open-interpreter Version: cmd: Open Interpreter 0.2.4 New Computer
Update , pkg: 0.2.4 OS Version and Architecture: macOS-14.4-arm64-arm-64bit CPU Info: arm RAM Info: 8.00 GB, used: 3.47, free: 0.41
# Interpreter Info
Vision: True
Model: gpt-4-vision-preview
Function calling: False
Context window: 110000
Max tokens: 4096
Auto run: True
API base: None
Offline: False
Curl output: Not local
# Messages
System Message: You are Open Interpreter, a world-class programmer that
can complete any goal by executing code.
When you write code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.
When a user refers to a filename, they're likely referring to an existing file in the directory you're currently executing code in.
In general, try to make plans with as few steps as possible. As for actually executing code to carry out that plan, don't try to do everything in one code block. You should try something, print information about it, then continue from there in tiny, informed steps. You will never get it on the first try, and attempting it in one go will often lead to errors you cant see.
Manually summarize text.
Do not try to write code that attempts the entire task at once, and verify at each step whether or not you're on track.
Computer
You may use the computer Python module to complete tasks:
computer.browser.search(query) # Silently searches Google for the query, returns
result. The user's browser is unaffected. (does not open a browser!)
computer.display.view() # Shows you what's on the screen, returns a `pil_image`
`in case you need it (rarely). **You almost always want to do this first!**
computer.keyboard.hotkey(" ", "command") # Opens spotlight (very useful)
computer.keyboard.write("hello")
# Use this to click text:
computer.mouse.click("text onscreen") # This clicks on the UI element with that
text. Use this **frequently** and get creative! To click a video, you could pass
the *timestamp* (which is usually written on the thumbnail) into this.
# Use this to click an icon, button, or other symbol:
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that
description. Use this very often.
computer.mouse.move("open recent >") # This moves the mouse over the UI element
with that text. Many dropdowns will disappear if you click them. You have to
hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly
inaccurate
computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen
that you expected to be there, you probably want to do this
x, y = computer.display.center() # Get your bearings
computer.clipboard.view() # Returns contents of clipboard
computer.os.get_selected_text() # Use frequently. If editing text, the user
often wants this
{{
import platform
if platform.system() == 'Darwin':
print('''
computer.browser.search(query) # Google search results will be returned from
this function as a string
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting",
start_date=datetime.datetime.now(), end=datetime.datetime.now() +
datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar
event
computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) #
Get events between dates. If end_date is None, only gets events for start_date
computer.calendar.delete_event(event_title="Meeting",
start_date=datetime.datetime) # Delete a specific event with a matching title
and start date, you may need to get use get_events() to find the specific event
object first
computer.contacts.get_phone_number("John Doe")
computer.contacts.get_email_address("John Doe")
computer.mail.send("[email protected]", "Meeting Reminder", "Reminder that our
meeting is at 3pm today.", ["path/to/attachment.pdf",
"path/to/attachment2.pdf"]) # Send an email with a optional attachments
computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or
all emails if False is passed
computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text
message. MUST be a phone number, so use computer.contacts.get_phone_number
frequently here
''')
}}
For rare and complex mouse actions, consider using computer vision libraries on
the computer.display.view() pil_image to produce a list of coordinates for
the mouse to move/drag to.
If the user highlighted text in an editor, then asked you to modify it, they
probably want you to keyboard.write over their version of the text.
Tasks are 100% computer-based. DO NOT simply write long messages to the user to complete tasks. You MUST put your text back into the program they're using to deliver your text!
Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's placeholder text (like "Search" to get into a search bar).
Applescript might be best for some tasks.
If you use plt.show(), the resulting image will be sent to you. However, if
you use PIL.Image.show(), the resulting image will NOT be sent to you.
It is very important to make sure you are focused on the right application and window. Often, your first command should always be to explicitly switch to the correct application.
When searching the web, use query parameters. For example, https://www.amazon.com/s?k=monitor
Try multiple methods before saying the task is impossible. You can do it!
Critical Routine Procedure for Multi-Step Tasks
Include computer.display.view() after a 2 second delay at the end of every
code block to verify your progress, then answer these questions in extreme
detail:
- Generally, what is happening on-screen?
- What is the active app?
- What hotkeys does this app support that might get be closer to my goal?
- What text areas are active, if any?
- What text is selected?
- What options could you take next to get closer to your goal?
{{
Add window information
try:
import pywinctl
active_window = pywinctl.getActiveWindow()
if active_window:
app_info = ""
if "_appName" in active_window.__dict__:
app_info += (
"Active Application: " + active_window.__dict__["_appName"]
)
if hasattr(active_window, "title"):
app_info += "\n" + "Active Window Title: " + active_window.title
elif "_winTitle" in active_window.__dict__:
app_info += (
"\n"
+ "Active Window Title:"
+ active_window.__dict__["_winTitle"]
)
if app_info != "":
print(
"\n\n# Important Information:\n"
+ app_info
+ "\n(If you need to be in another active application to help
the user, you need to switch to it.)" )
except: # Non blocking pass
}}
{'role': 'user', 'type': 'message', 'content': 'Open my music and tell
me what do we have to play there'}
{'role': 'assistant', 'type': 'message', 'content': 'First, I'll need to open the Music application on your computer. Let's use a spotlight search to open the Music app and then check what is available to play.\n\nLet's start by pressing the "command" and spacebar keys to open Spotlight, and then I'll type "Music" to find and open the Music application. After that, I'll wait about 2 seconds and usecomputer.display.view to see what's on the screen.\n\n'}
{'role': 'assistant', 'type': 'code', 'format': 'python', 'content': '\nimport time\n\n# Open Spotlight and type Music to open the app\ncomputer.keyboard.hotkey(" ", "command")\ntime.sleep(0.5)\ncomputer.keyboard.write("Music")\ntime.sleep(0.5)\n computer.keyboard.press("enter")\ntime.sleep(2) # wait for the Music app to open\n\n# Capture the screen to see what's happening\ncomputer.display.view()\n'}
{'role': 'computer', 'type': 'console', 'format': 'output', 'content': ''}
{'role': 'computer', 'type': 'image', 'format': 'base64.png', 'content': 'iVBORw0KGgoAAAANSUhEUgAADSAAAAg0CAIAAACcJK5OAAAMQGlDQ1BJQ0MgUHJvZmlsZQAAeJyVVwd YU8kWnluSkEBoAQSkhN4EESkBpITQAkjvNkISIJQYA0HFjiwquBZURMCGrooodpodsbMo9r5YUFDWxYJ deZMCuu4r35vvmzv//efMf86cO/fOHQDUTnBEomxUHYAcYZ44OsiPnpiUTCf1AAyQgTrwBKM43FwRMzI yDMAy1P69vLsBEGl71V6q9c/+/1o0ePxcLgBIJMSpvFxuDsQHAcCruSJxHgBE...qDlhkqFTr9U1/OVt n9bwhXECYP12fWQuIcvCYJjMw4oSLWu03MjsdO9UBDP8nV+4YddWfFm+LdPqMAejWoxAOaFQdLFr3q5S YtNtqHytuKP3Ghubrg6++S1AeyOPP+bONX41tJ7uDet4lZpYUtNGsxzX3nh6/soOp3GMgeUJLJfsEbgV w/bK88z/kies3sZzF55FimCQqrBYnhKcsRndNNbcws+fPLIRr7vX1WcRmq8am6uZ41WCTb9XzkgNqaYo 2RjbW5WnPqq++v7/D6EvUR5q3fVGAAAAAElFTkSuQmCC'}
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/bin/interpreter",
line 8, in
Expected behavior
At least it has to see what is on my screen.
Screenshots
No response
Open Interpreter version
0.2.4
Python version
3.11.8
Operating System name and version
mac os 14.4
Additional context
No response
Hey @Jpkovas Sorry to hear that you're having issues. We're working on a fix
@Jpkovas Are you sure you're on 0.2.4?
Can you please run %info after you launch interpreter to verify?
This should have been resolved in https://github.com/OpenInterpreter/open-interpreter/pull/1117/files
Thanks :)
I'm also seeing this problem on 0.2.4:
Also seeing on 0.2.4.
Trace:
File
"/Users/darin/interpreter/.venv/lib/python3.11/site-packages/interpreter/core/llm/u
tils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages
new_message["content"] = new_message["content"].strip()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'strip'
~/interpreter 34s
.venv ❯ [IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
~/interpreter 34s
.venv ❯ interpreter --os
▌ OS Control enabled
> %info
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
Python Version: 3.11.7
Pip Version: 23.2.1
Open-interpreter Version: cmd: Open Interpreter 0.2.4 New Computer Update
, pkg: 0.2.4
OS Version and Architecture: macOS-14.2.1-arm64-arm-64bit
CPU Info: arm
RAM Info: 16.00 GB, used: 6.88, free: 0.36
# Interpreter Info
Vision: True
Model: gpt-4-vision-preview
Function calling: False
Context window: 110000
Max tokens: 4096
Auto run: True
API base: None
Offline: False
Curl output: Not local
```
I'm also seeing the same issue on 0.2.4.
Issue occurs when computer.display.view() is called to capture a screenshot, resulting in interpreter crashing with: AttributeError: 'list' object has no attribute 'strip'`
Environment Details:
- Python Version: 3.11.1
- Pip Version: 24.0
- OS Version and Architecture: macOS-14.5-arm64-arm-64bit
- CPU Info: arm
- RAM Info: 16.00 GB, used: 7.40, free: 0.40
Error Details
- Function Causing Error:
convert_to_openai_messages - Error Description: Attempting to call
strip()on a list item withinnew_message["content"]
Interpreter Settings
- Version: 0.2.4
- Vision: True
- Model: gpt-4-vision-preview
- Function calling: False
- Context window: 110000 tokens
- Max tokens: 4096
- Auto run: True
- API base: None
- Offline: False
- Curl output: Not local
Trace:
File "/Users/nathandryer/.pyenv/versions/3.11.1/envs/open-interpreter/lib/python3.11/site-packages/interpreter/core/llm/utils/convert_to_openai_messages.py", line 173, in convert_to_openai_messages
new_message["content"] = new_message["content"].strip()
AttributeError: 'list' object has no attribute 'strip'
Let me know if you need anymore info.
I was also facing the same issue in a Debian with i3wm. Same version: 0.2.4.
It seems like the changes @MikeBirdTech commented are not in that version yet. However, you can simply apply them yourself and see it fixed!
Thanks, @MikeBirdTech! :bow:
How does one apply these changes?