script-commands icon indicating copy to clipboard operation
script-commands copied to clipboard

Character encoding broken

Open VladimirFokow opened this issue 1 year ago • 6 comments

When copying non-English characters (specifically, Ukrainian) from/to the clipboard, the encoding is broken.

To reproduce:

Install pyperclip:

pip install pyperclip

Write a simple python script, e.g.:

import pyperclip

text = pyperclip.paste()  # text is whatever is in the clipboard
pyperclip.copy(text)  # save text to clipboard

Copy some Ukrainian characters to your clipboard, e.g.: тест

When the script is run from the terminal, e.g. python script.py, it works fine: it saves to the clipboard the same thing that was there.

When it's run as the raycast "Script Command", it saves this to the clipboard: ????

If I change the script so that it doesn't get text from the clipboard but instead saves "тест" to clipboard directly: pyperclip.copy('тест'), then I get this in my clipboard: —Ç–µ—Å—Ç

VladimirFokow avatar Aug 02 '24 10:08 VladimirFokow

Hey there @VladimirFokow,

Sorry to hear you are having issues. I am sadly not very knowledgeable with Python. We have had similar problems before, but to be honest I don't know what the issue could be. Here is what I said to a previous user.

dehesa avatar Aug 05 '24 15:08 dehesa

Hi, thanks for the reply.. Just tested it with bash - it has the same problem.

Also printed the env variables LANG and PATH:

#!/bin/bash

# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title test_cp
# @raycast.mode fullOutput

# Optional parameters:
# @raycast.description Test the clipboard
# @raycast.packageName test_cp
# @raycast.icon 🧪


# Save the content of the clipboard into a variable `text`
text=$(pbpaste)
# Save the content of the variable `text` back into the clipboard
echo "$text" | pbcopy

# Example non-English characters to copy: тест
# result in the clipboard: ????




echo $LANG  # en_DE.UTF-8
echo $PATH  # /usr/local/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin






# # Alternative test:
# text='тест'
# echo "$text" | pbcopy
# # result in the clipboard: —Ç–µ—Å—Ç

we run the scripts as a subprocess

Could you please point to the code location where this subprocess is created? (to try isolating the issue)

(unfortunately, I haven't used Swift or Ruby before)

VladimirFokow avatar Aug 05 '24 16:08 VladimirFokow

@VladimirFokow to me, sounds very likely as a UTF-8 unicode problem. Using the example you provided:

import pyperclip

text = pyperclip.paste()  # text is whatever is in the clipboard
pyperclip.copy(text)  # save text to clipboard

give a try in this piece of code and let us know if it will work for you

import pyperclip
import os
import chardet

# Ensure the environment uses UTF-8 encoding
os.environ['PYTHONIOENCODING'] = 'utf-8'

# Function to detect and convert encoding
def convert_to_utf8(text):
    result = chardet.detect(text.encode())
    encoding = result['encoding']
    return text.encode(encoding).decode('utf-8')

# Get text from clipboard
text = pyperclip.paste()

# Convert text to UTF-8
text_utf8 = convert_to_utf8(text)

# Copy text back to clipboard
pyperclip.copy(text_utf8)

print("Text successfully copied to clipboard.")

unnamedd avatar Aug 06 '24 08:08 unnamedd

hi @unnamedd , thanks for the idea. But it didn't help..

For experimenting, here is an example "Script Command" which can invoke a python script:

#!/bin/bash

# Required parameters:
# @raycast.schemaVersion 1
# @raycast.title test_cp_py
# @raycast.mode fullOutput

# Optional parameters:
# @raycast.description Test the clipboard (Python)
# @raycast.packageName test_cp_py
# @raycast.icon 🐍

# /path/to/python can be seen by calling: `which python3`
/path/to/python /path/to/script.py

added some prints (click)

import pyperclip
import os
import chardet

# Ensure the environment uses UTF-8 encoding
print("PYTHONIOENCODING: ", os.environ.get('PYTHONIOENCODING'))
os.environ['PYTHONIOENCODING'] = 'utf-8'
print("PYTHONIOENCODING: ", os.environ.get('PYTHONIOENCODING'))


# Function to detect and convert encoding
def convert_to_utf8(text):
    result = chardet.detect(text.encode())
    print('result, detected with chardet:', result)
    encoding = result['encoding']
    print('encoding:', encoding)
    return text.encode(encoding).decode('utf-8')



text = pyperclip.paste()
print(text)
text_utf8 = convert_to_utf8(text)
print(text_utf8)
pyperclip.copy(text_utf8)





print('\nencoding of "тест":', chardet.detect('тест'.encode()))
print('  just question marks:')
print('encoding of "????":', chardet.detect('тест'.encode()))
print('  symbols that the script produced to the clipboard:')
print('encoding of "????":', chardet.detect('????'.encode()))

Output:

PYTHONIOENCODING:  None
PYTHONIOENCODING:  utf-8
????
result, detected with chardet: {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
encoding: ascii
????

encoding of "тест": {'encoding': 'utf-8', 'confidence': 0.938125, 'language': ''}
  just question marks:
encoding of "????": {'encoding': 'utf-8', 'confidence': 0.938125, 'language': ''}
  symbols that the script produced to the clipboard:
encoding of "????": {'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

Done in 0.17s

Looks like the encoding from clipboard doesn't survive the transition to the raycast process (the one which is spawned to executes the script). How is this process created?

It could be beneficial to isolate the issue (create a minimal reproducible example of creating this process to see exactly where the encoding problem happens)

VladimirFokow avatar Aug 06 '24 10:08 VladimirFokow

Hi, could someone please point me at the code where the subprocess is created? Thanks!

quote:

we run the scripts as a subprocess

VladimirFokow avatar Aug 17 '24 02:08 VladimirFokow

Hey @VladimirFokow, very sorry it took so long to get back to you. The code spawning the process is part of the swift close-source app. We don't do anything special there. Let me copy the extract of how we set it up (it is in Swift):

let process = Process()
process.qualityOfService = qos.asQualityOfService
process.executableURL = URL(fileURLWithPath: command.scriptPath)

var environment = ProcessInfo.processInfo.environment
let defaultPath = "/usr/local/bin:/opt/homebrew/bin"
if let path = environment["PATH"] {
  environment["PATH"] = defaultPath + ":\(path)"
} else {
  environment["PATH"] = defaultPath
}
environment["LANG"] = "\(Locale.preferredIdentifier ?? Locale.autoupdatingCurrent.identifier).UTF-8"
let proxySettings = UserDefaults.standard.useSystemInternetProxySettings ? InternetProxySettings.fromSystem() : InternetProxySettings.fromEnvironment()
environment.merge(proxySettings.toEnvVars()) { $1 }
process.environment = environment
if !arguments.isEmpty {
  process.arguments = arguments
}

if let currentDirectoryPath = command.currentDirectoryPath {
  if currentDirectoryPath.hasPrefix("/") || currentDirectoryPath.hasPrefix("~") {
    process.currentDirectoryPath = currentDirectoryPath
  } else {
    let scriptDirURL = URL(fileURLWithPath: command.scriptPath).deletingLastPathComponent()
    process.currentDirectoryPath = scriptDirURL.appendingPathComponent(currentDirectoryPath).standardizedFileURL.path
  }
} else {
  process.currentDirectoryPath = URL(fileURLWithPath: command.scriptPath).deletingLastPathComponent().path
}

dehesa avatar Sep 13 '24 13:09 dehesa

I will be closing this for now, but feel free to open it.

dehesa avatar Oct 16 '24 08:10 dehesa