guidance EOFError with guidance 0.2.0 when trying to work through intro notebook

The bug

I'm trying to work through the introductory notebook here and can't get past the very first example.

To Reproduce

from guidance import models
mistral = models.LlamaCpp(
    # downloaded from: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf
    "models/mistral-7b-instruct-v0.2.Q8_0.gguf",
    n_gpu_layers=-1,
    n_ctx=4096,
)
lm = mistral + "Who won the last Kentucky derby and by how much?"

When I run this I get a segmentation fault:

$ python example.py 
llama_init_from_model: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
Segmentation fault: 11

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): macOS 15.3.1
Python version 3.11.11

Guidance Version (guidance.__version__):

guidance                0.2.0
llama_cpp_python        0.3.7
torch                   2.6.0
transformers            4.49.0

Please see my update below on other library versions I tried, as I believe there are issues both with guidance and llama-cpp-python.

Mar 06 '25 00:03 nchammas

I am encountering the same issue. If I try to only use llama_cpp like below, it works fine.

from llama_cpp import Llama

llm = Llama(
      model_path="/home/dockeruser/.ollama/models/blobs/sha256-74701a8c35f6c8d9a4b91f3f3497643001d63e0c7a84e085bed452548fa88d45",
      # n_gpu_layers=-1, # Uncomment to use GPU acceleration
      # seed=1337, # Uncomment to set a specific seed
      # n_ctx=2048, # Uncomment to increase the context window
)
output = llm(
      "Q: Name the planets in the solar system? A: ", # Prompt
      max_tokens=32, # Generate up to 32 tokens, set to None to generate up to the end of the context window
      stop=["Q:", "\n"], # Stop generating just before the model would generate a new question
      echo=True # Echo the prompt back in the output
) # Generate a completion, can also call create_completion
print(output)

Mar 07 '25 11:03 vintertown

@nchammas @vintertown Can't remember the reason for this error but it has to do something with the llamacpp version. Try installing a previous version, for me 0.2.90 worked.

Mar 07 '25 17:03 adityaprakash-work

Downgrading llama-cpp-python from 0.3.7 to 0.3.6 avoids the segmentation fault but results in another error:

$ python example.py 
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256          (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128      (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256      (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32                      (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16                     (not supported)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 131, in _main
    prepare(preparation_data)
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 246, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 297, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "...home/.../example.py", line 29, in <module>
    model = models.LlamaCpp(
            ^^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/llama_cpp/_llama_cpp.py", line 358, in __init__
    engine = LlamaCppEngine(
             ^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/llama_cpp/_llama_cpp.py", line 158, in __init__
    super().__init__(
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/_model.py", line 334, in __init__
    self.monitor = Monitor(self.metrics)
                   ^^^^^^^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/_model.py", line 2099, in __init__
    self.mp_manager = Manager()
                      ^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/managers.py", line 563, in start
    self._process.start()
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 164, in get_preparation_data
    _check_not_importing_main()
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/spawn.py", line 140, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

        To fix this issue, refer to the "Safe importing of main module"
        section in https://docs.python.org/3/library/multiprocessing.html
        
Traceback (most recent call last):
  File "...home/.../example.py", line 29, in <module>
    model = models.LlamaCpp(
            ^^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/llama_cpp/_llama_cpp.py", line 358, in __init__
    engine = LlamaCppEngine(
             ^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/llama_cpp/_llama_cpp.py", line 158, in __init__
    super().__init__(
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/_model.py", line 334, in __init__
    self.monitor = Monitor(self.metrics)
                   ^^^^^^^^^^^^^^^^^^^^^
  File "...home/.../.venv/lib/python3.11/site-packages/guidance/models/_model.py", line 2099, in __init__
    self.mp_manager = Manager()
                      ^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/managers.py", line 567, in start
    self._address = reader.recv()
                    ^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "...home/.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError

Trying llama-cpp-python at 0.2.90 and again at 0.2.25 results in the same EOFError.

Mar 07 '25 19:03 nchammas

for me 0.2.90 worked

@adityaprakash-work Can you clarify what you mean by "worked"? Did you run exactly the repro script I posted with guidance 0.2.0 and llama-cpp-python 0.2.90?

Mar 07 '25 19:03 nchammas

OK, I got this to work by downgrading both guidance and llama-cpp-python. If I use the latest version of either library, there is a problem.

Works:

guidance           0.1.16
llama_cpp_python   0.3.6

EOFError:

guidance           0.2.0
llama_cpp_python   0.3.6

Segmentation fault:

guidance           0.1.16 or 0.2.0
llama_cpp_python   0.3.7

So I believe there are separate issues with both guidance and llama-cpp-python.

Mar 07 '25 20:03 nchammas

I can replicate this as well. 0.2.0 does not seem to be usable. I can confirm that downgrading to guidance 0.1.16 and llamacpp 0.3.6 allows me to generate tokens.

The issue does not seem to be limited to just LlamaCpp, even the Transformers backend does not seem to work:

from guidance import models, gen

# path = 'mistral-7b-instruct-v0.2.Q8_0.gguf'
# mistral = models.LlamaCpp(path)
path = 'HuggingFaceTB/SmolLM2-135M-Instruct'
model = models.Transformers(path)

# append text or generations to the model
print(model + f'Do you want a joke or a poem? ' + gen(max_tokens=100))

Yields the following error:

(guidance) λ ~/dottxt/debug/guidance/ python tst.py                 
/home/cameron/dottxt/debug/guidance/.venv/lib/python3.12/site-packages/guidance/chat.py:80: UserWarning: Chat template {% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system
You are a helpful AI assistant named SmolLM, trained by Hugging Face<|im_end|>
' }}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %} was unable to be loaded directly into guidance.
                        Defaulting to the ChatML format which may not be optimal for the selected model. 
                        For best results, create and pass in a `guidance.ChatTemplate` subclass for your model.
  warnings.warn(
thread '<unnamed>' panicked at toktrie/src/toktree.rs:563:37:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "/home/cameron/dottxt/debug/guidance/tst.py", line 10, in <module>
    print(mistral + f'Do you want a joke or a poem? ' + gen(max_tokens=100))
          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
  File "/home/cameron/dottxt/debug/guidance/.venv/lib/python3.12/site-packages/guidance/models/_model.py", line 1207, in __add__
    out = lm._run_stateless(value)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cameron/dottxt/debug/guidance/.venv/lib/python3.12/site-packages/guidance/models/_model.py", line 1413, in _run_stateless
    for chunk in gen_obj:
                 ^^^^^^^
  File "/home/cameron/dottxt/debug/guidance/.venv/lib/python3.12/site-packages/guidance/models/_model.py", line 453, in __call__
    mask, ll_response = mask_fut.result()
                        ^^^^^^^^^^^^^^^^^
  File "/home/cameron/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/home/cameron/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/home/cameron/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cameron/dottxt/debug/guidance/.venv/lib/python3.12/site-packages/guidance/_parser.py", line 97, in compute_mask
    mask, ll_response_string = self.ll_interpreter.compute_mask()
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: called `Option::unwrap()` on a `None` value

Here's my version info for the transformers error.

annotated-types==0.7.0 asttokens==3.0.0 attrs==25.1.0 blessed==1.20.0 certifi==2025.1.31 charset-normalizer==3.4.1 comm==0.2.2 decorator==5.2.1 diskcache==5.6.3 executing==2.2.0 filelock==3.17.0 fsspec==2025.3.0 gpustat==1.1.1 guidance==0.2.0 guidance-stitch==0.1.4 huggingface-hub==0.29.2 idna==3.10 ipython==9.0.2 ipython-pygments-lexers==1.1.1 ipywidgets==8.1.5 jedi==0.19.2 jinja2==3.1.6 jupyterlab-widgets==3.0.13 llama-cpp-python==0.3.6 llguidance==0.5.1 markupsafe==3.0.2 matplotlib-inline==0.1.7 mpmath==1.3.0 networkx==3.4.2 numpy==2.2.3 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-cusparselt-cu12==0.6.2 nvidia-ml-py==12.570.86 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 ordered-set==4.1.0 packaging==24.2 parso==0.8.4 pexpect==4.9.0 platformdirs==4.3.6 prompt-toolkit==3.0.50 protobuf==6.30.0 psutil==7.0.0 ptyprocess==0.7.0 pure-eval==0.2.3 pydantic==2.10.6 pydantic-core==2.27.2 pygments==2.19.1 pyyaml==6.0.2 referencing==0.36.2 regex==2024.11.6 requests==2.32.3 rpds-py==0.23.1 safetensors==0.5.3 setuptools==76.0.0 six==1.17.0 stack-data==0.6.3 sympy==1.13.1 tiktoken==0.9.0 tokenizers==0.21.0 torch==2.6.0 tqdm==4.67.1 traitlets==5.14.3 transformers==4.49.0 triton==3.2.0 typing-extensions==4.12.2 urllib3==2.3.0 wcwidth==0.2.13 widgetsnbextension==4.0.13

Mar 10 '25 21:03 cpfiffer

The fact that the 0.2.0 release is so broken suggests that there is some notable gap in the continuous integration tests for this project, as I assume the team behind it did not know they were publishing such a broken release.

Mar 11 '25 00:03 nchammas

Hi All,

Want to apologize here -- we refactored significant chunks of code for v0.2, including the spin off of a "low level" library. I think part of these errors are caused by our internal dependencies now being out of sync with the parent library, and poor pinning on our part. We're planning a release this week that should hopefully address all of this. Appreciate your patience while we get this all sorted!

-Harsha

Mar 11 '25 00:03 Harsha-Nori

It seems that the EOFError is due to multiprocessing.Manager being used to manage a subprocess that collects certain metrics.

The error message you reported indicates that this is due to (1) not running this code in an if __name__ == '__main__': guard and (2) the default behavior of python on macOS to use spawn instead of fork to start child processes.

@nchammas you can resolve your problem by either using said guard or by changing your multiprocessing start method to fork.

@nopdive I'm getting a broken pipe on the monitoring process at system exit either way, so there's definitely still something to fix there. I'd also probably ensure that the monitoring is opt-in in non-interactive sessions rather than opt-out (which may be a more sensible default for notebooks).

Mar 13 '25 21:03 hudson-ai

The error message you reported indicates that this is due to (1) not running this code in an if __name__ == '__main__': guard and (2) the default behavior of python on macOS to use spawn instead of fork to start child processes.

I hope that the __name__ guard is just a quick hack for now and not somehow required by design. It would be weird if Guidance did not support running Python scripts without this guard, though I understand they are good practice especially if you are writing a library.

With regards to using fork, I note that the Python docs warn that it is unsafe on macOS:

Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess as macOS system libraries may start threads. See bpo-33725.

In any case, the guard itself doesn't completely resolve the problem. I do get output, but I also get an error alongside it:

# example.py
from guidance import models, lark

grammar_def = "start: /\w+/"

if __name__ == "__main__":
    grammar = lark(grammar_def)
    model = models.Transformers("microsoft/Phi-4-mini-instruct")
    print(model + "This is a word: " + grammar)

$ python example.py
Loading checkpoint shards: 100%|███████████████████████████████| 2/2 [00:10<00:00,  5.23s/it]
.../.venv/lib/python3.11/site-packages/guidance/chat.py:80: UserWarning: Chat template {% for message in messages %}{% if message['role'] == 'system' and 'tools' in message and message['tools'] is not none %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|tool|>' + message['tools'] + '<|/tool|>' + '<|end|>' }}{% else %}{{ '<|' + message['role'] + '|>' + message['content'] + '<|end|>' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>' }}{% else %}{{ eos_token }}{% endif %} was unable to be loaded directly into guidance.
                        Defaulting to the ChatML format which may not be optimal for the selected model. 
                        For best results, create and pass in a `guidance.ChatTemplate` subclass for your model.
  warnings.warn(
gpustat is not installed, run `pip install gpustat` to collect GPU stats.
This is a word: 1
Error in monitoring: 
---------------------------------------------------------------------------
Traceback (most recent call last):
  File ".../.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/managers.py", line 260, in serve_client
    self.id_to_local_proxy_obj[ident]
    ~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^
KeyError: '10d4b9310'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/managers.py", line 262, in serve_client
    raise ke
  File ".../.pyenv/versions/3.11.11/lib/python3.11/multiprocessing/managers.py", line 256, in serve_client
    obj, exposed, gettypeid = id_to_obj[ident]
                              ~~~~~~~~~^^^^^^^
KeyError: '10d4b9310'
---------------------------------------------------------------------------

Interestingly, if I run the test several times, on occasion I do not get the KeyError, though I do still get the text Error in monitoring:.

I am running guidance at 22df35a, which is the latest on main as of this moment.

Mar 14 '25 18:03 nchammas

@nchammas glad you're at the very least able to make it run now. We'll look into alternatives to multiprocessing.Manager, as I agree that requiring the guard makes this a very leaky abstraction. @nopdive let's also definitely figure out the error in monitoring and as I said before, probably disable it entirely in non-interactive sessions (at least by default).

Mar 14 '25 20:03 hudson-ai