RD-Agent icon indicating copy to clipboard operation
RD-Agent copied to clipboard

UnicodeDecodeError: 'gbk' codec can't decode byte 0x9e in position 8497: illegal multibyte sequence

Open qq2100803 opened this issue 6 months ago • 3 comments

🐛 Bug Description

To Reproduce

Steps to reproduce the behavior: (rdagent) C:\Users\a>rdagent fin_factor 2025-06-04 10:49:07.992 | INFO | rdagent.oai.backend.litellm::27 - backend='rdagent.oai.backend.LiteLLMAPIBackend' chat_model='gpt-4o' embedding_model='text-embedding-3-small' log_llm_chat_content=True use_azure=False chat_use_azure=False embedding_use_azure=False chat_use_azure_token_provider=False embedding_use_azure_token_provider=False managed_identity_client_id=None max_retry=10 retry_wait_seconds=1 dump_chat_cache=False use_chat_cache=False dump_embedding_cache=False use_embedding_cache=False prompt_cache_path='C:\Users\a\prompt_cache.db' max_past_message_include=10 use_auto_chat_cache_seed_gen=False init_chat_cache_seed=42 openai_api_key='sk-proj-G29Dd6TGRppDdGs_rD-v3hylAeOY2k0qHSpF5nUHQO4wSNBvOwrXGZH8PFbihE5ACEBT1OBuLYT3BlbkFJCftF3llsD8WqYOj_og5aViKXcYOBnzq1' chat_openai_api_key=None chat_openai_base_url=None chat_azure_api_base='' chat_azure_api_version='' chat_max_tokens=None chat_temperature=0.5 chat_stream=True chat_seed=None chat_frequency_penalty=0.0 chat_presence_penalty=0.0 chat_token_limit=100000 default_system_prompt="You are an AI assistant who helps to answer user's questions." system_prompt_role='system' embedding_openai_api_key='' embedding_openai_base_url='' embedding_azure_api_base='' embedding_azure_api_version='' embedding_max_str_num=50 use_llama2=False llama2_ckpt_dir='Llama-2-7b-chat' llama2_tokenizer_path='Llama-2-7b-chat/tokenizer.model' llams2_max_batch_size=8 use_gcr_endpoint=False gcr_endpoint_type='llama2_70b' llama2_70b_endpoint='' llama2_70b_endpoint_key='' llama2_70b_endpoint_deployment='' llama3_70b_endpoint='' llama3_70b_endpoint_key='' llama3_70b_endpoint_deployment='' phi2_endpoint='' phi2_endpoint_key='' phi2_endpoint_deployment='' phi3_4k_endpoint='' phi3_4k_endpoint_key='' phi3_4k_endpoint_deployment='' phi3_128k_endpoint='' phi3_128k_endpoint_key='' phi3_128k_endpoint_deployment='' gcr_endpoint_temperature=0.7 gcr_endpoint_top_p=0.9 gcr_endpoint_do_sample=False gcr_endpoint_max_token=100 chat_use_azure_deepseek=False chat_azure_deepseek_endpoint='' chat_azure_deepseek_key='' chat_model_map='{}' 2025-06-04 10:49:11.028 | INFO | rdagent.utils.env:prepare:504 - Building the image from dockerfile: C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\docker ⠋ Successfully tagged local_qlib:latest

2025-06-04 10:49:11.117 | INFO | rdagent.utils.env:prepare:522 - Finished building the image from dockerfile: C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\docker 2025-06-04 10:49:11.129 | INFO | rdagent.utils.env:prepare:700 - We are downloading! Traceback (most recent call last): File "C:\Users\a\miniconda3\envs\rdagent\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\a\miniconda3\envs\rdagent\lib\runpy.py", line 86, in run_code exec(code, run_globals) File "C:\Users\a\miniconda3\envs\rdagent\Scripts\rdagent.exe_main.py", line 7, in sys.exit(app()) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\app\cli.py", line 48, in app fire.Fire( File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\fire\core.py", line 135, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\fire\core.py", line 468, in _Fire component, remaining_args = _CallAndUpdateTrace( File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\fire\core.py", line 684, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\app\qlib_rd_loop\factor.py", line 40, in main model_loop = FactorRDLoop(FACTOR_PROP_SETTING) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\components\workflow\rd_loop.py", line 28, in init scen: Scenario = import_class(PROP_SETTING.scen)() File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\experiment\factor_experiment.py", line 28, in init self._source_data = deepcopy(get_data_folder_intro()) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\experiment\utils.py", line 150, in get_data_folder_intro generate_data_folder_from_qlib() File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\experiment\utils.py", line 18, in generate_data_folder_from_qlib qtde.prepare() File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\utils\env.py", line 702, in prepare self.run(entry=cmd) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\utils\env.py", line 116, in run stdout, _ = self.run_ret_code(entry=entry, local_path=local_path, env=env, **kwargs) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\utils\env.py", line 189, in run_ret_code stdout, return_code = self.cached_run(entry_add_timeout, local_path, env, running_extra_volume) File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\utils\env.py", line 226, in cached_run [ File "C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\utils\env.py", line 227, in [str(path.relative_to(Path(local_path))), path.read_text()] File "C:\Users\a\miniconda3\envs\rdagent\lib\pathlib.py", line 1135, in read_text return f.read() UnicodeDecodeError: 'gbk' codec can't decode byte 0x9e in position 8497: illegal multibyte sequence

Expected Behavior

Screenshot

Environment

Note: Users can run rdagent collect_info to get system information and paste it directly here.

  • Name of current operating system:
  • Processor architecture:
  • System, version, and hardware information:
  • Version number of the system:
  • Python version:
  • Container ID:
  • Container Name:
  • Container Status:
  • Image ID used by the container:
  • Image tag used by the container:
  • Container port mapping:
  • Container Label:
  • Startup Commands:
  • RD-Agent version:
  • Package version:

Additional Notes

qq2100803 avatar Jun 05 '25 13:06 qq2100803

2025-06-05 17:05:37.148 | INFO | rdagent.oai.backend.litellm::27 - backend='rdagent.oai.backend.LiteLLMAPIBackend' chat_model='gpt-4o' embedding_model='text-embedding-3-small' log_llm_chat_content=True use_azure=False chat_use_azure=False embedding_use_azure=False chat_use_azure_token_provider=False embedding_use_azure_token_provider=False managed_identity_client_id=None max_retry=10 retry_wait_seconds=1 dump_chat_cache=False use_chat_cache=False dump_embedding_cache=False use_embedding_cache=False prompt_cache_path='D:\qlib\prompt_cache.db' max_past_message_include=10 use_auto_chat_cache_seed_gen=False init_chat_cache_seed=42 openai_api_key='sk-proj-G29Dd6TGRppDdGs_rD-v3hylAeOY2k0qHSpF5nUHQO4wSNBvOwrXGZH8PFbihE5ACEBT1OBuLYT3BlbkFJCftF3llsD8WqYOj_og5aViKXcYOBnzq1' chat_openai_api_key=None chat_openai_base_url=None chat_azure_api_base='' chat_azure_api_version='' chat_max_tokens=None chat_temperature=0.5 chat_stream=True chat_seed=None chat_frequency_penalty=0.0 chat_presence_penalty=0.0 chat_token_limit=100000 default_system_prompt="You are an AI assistant who helps to answer user's questions." system_prompt_role='system' embedding_openai_api_key='' embedding_openai_base_url='' embedding_azure_api_base='' embedding_azure_api_version='' embedding_max_str_num=50 use_llama2=False llama2_ckpt_dir='Llama-2-7b-chat' llama2_tokenizer_path='Llama-2-7b-chat/tokenizer.model' llams2_max_batch_size=8 use_gcr_endpoint=False gcr_endpoint_type='llama2_70b' llama2_70b_endpoint='' llama2_70b_endpoint_key='' llama2_70b_endpoint_deployment='' llama3_70b_endpoint='' llama3_70b_endpoint_key='' llama3_70b_endpoint_deployment='' phi2_endpoint='' phi2_endpoint_key='' phi2_endpoint_deployment='' phi3_4k_endpoint='' phi3_4k_endpoint_key='' phi3_4k_endpoint_deployment='' phi3_128k_endpoint='' phi3_128k_endpoint_key='' phi3_128k_endpoint_deployment='' gcr_endpoint_temperature=0.7 gcr_endpoint_top_p=0.9 gcr_endpoint_do_sample=False gcr_endpoint_max_token=100 chat_use_azure_deepseek=False chat_azure_deepseek_endpoint='' chat_azure_deepseek_key='' chat_model_map='{}' 2025-06-05 17:05:40.274 | INFO | rdagent.utils.env:prepare:504 - Building the image from dockerfile: C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\docker ⠋ Successfully tagged local_qlib:latest

2025-06-05 17:05:40.352 | INFO | rdagent.utils.env:prepare:522 - Finished building the image from dockerfile: C:\Users\a\miniconda3\envs\rdagent\lib\site-packages\rdagent\scenarios\qlib\docker 2025-06-05 17:05:40.363 | INFO | rdagent.utils.env:prepare:704 - Data already exists. Download skipped. Error: NpipeHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60) Error: NpipeHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60) Error: NpipeHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60) Error: NpipeHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=60) 这个报错是因为在国内没法连openai的服务器吗

qq2100803 avatar Jun 05 '25 16:06 qq2100803

same problem

JaggerH avatar Jun 14 '25 09:06 JaggerH

🛠️ Step-by-Step Fix Summary

File: rdagent/utils/agent/tpl.py
Function: load_content,Around line 55

After:

    for file_path in file_path_l:
        try:
            if ftype == "yaml":
                with file_path.open("r", encoding="utf-8") as file:
                    # Load the YAML file
                    yaml_content = yaml.safe_load(file)
                # Traverse the YAML content to get the desired template
                for key in yaml_trace:
                    yaml_content = yaml_content[key]
                return yaml_content

            return file_path.read_text()
        except FileNotFoundError:
            continue  # the file does not exist, so goto the next loop.
        except KeyError:
            continue  # the file exists, but the yaml key is missing.
    else:
        raise FileNotFoundError(f"Cannot find {uri} in {file_path_l}")

This is important

 with file_path.open("r", encoding="utf-8") as file:

This ensures YAML files are always read using UTF-8, avoiding GBK decoding errors.

tiengming avatar Sep 03 '25 11:09 tiengming