openinference icon indicating copy to clipboard operation
openinference copied to clipboard

[bug] Instructor when instrumented tries to instantiate the type that is given as a `response_model`

Open spott opened this issue 1 year ago • 5 comments

Describe the bug When using instructor, one of the things you pass to the client.chat.completions.create is a Pydantic model type for the kwarg response_model. When openinference instruments instructor, it takes all kwargs to the client.chat.completions.create and tries to save them as attributes. Unfortunately, it doesn't expect to get anything that isn't a standard type (or sequence of standard types), and the mask operation for a span when saving attributes returns with:

return value() if callable(value) else value

which (in my case) tries to instantiate a pydantic type that can't be instantiated without arguments.

I would be happy to put together a pull request for this, however I'm not sure where the problem should be fixed. Should it be fixed in the mask function? or in the instructor _wrapped.py patched_new_func where it creates the dictionary of attributes (by stringifying all inputs), or in _WrappedSpan.set_attributes where it can check if a value isn't a AttributeValue and stringily it or similar before it passes it to _WrappedSpan.set_attribute.

Let me know where you think this should be fixed.

spott avatar Dec 07 '24 04:12 spott

Thanks for the note @spott - we'll take a look. If there's a minimum working example we can take a look at that would be amazing. Let us know if we can help in any way.

mikeldking avatar Dec 09 '24 21:12 mikeldking

Let me try and show what I've found. I don't think a minimum working example is necessary, though it shouldn't be that hard to make one.

When you patch instructor, the patched async function here is called (there is a similar function for sync functions as well).

here that function calls start_as_current_span. However, notice that it takes the kwargs of the function, and flattens them and passes it as a dict as attributes for the span.

After a few more layers of indirection, we get to here, the _WrappedSpan class. This class has two functions, set_attributes which accepts dicts of AttributeValues, and set_attribute which accepts callables that return AttributeValue or an AttributeValue.

AttributeValue looks like this:

AttributeValue = Union[
    str,
    bool,
    int,
    float,
    Sequence[str],
    Sequence[bool],
    Sequence[int],
    Sequence[float],
]

set_attribute calls a mask function that, at the end, checks if the value is callable, and if it is, calls it with no arguments.

And there is where the problem lies. The function that this whole thing is wrapping (client.chat.completions.create in this case), takes as an argument a response_model... which is a Pydantic model. This isn't a AttributeValue, and when mask tries to call it with no arguments, it tries to construct the Pydantic object, and if that object has required fields, it throws.

Ultimately the problem is that the types that set_attributes accepts aren't enforced, and the flattening of the kwargs here passes non-AttributeValue values along the chain.

So, I'm happy to submit a pull request... but I'm not sure where the contract for AttributeValue should be enforced. Should it be enforced at the flattening of the kwargs? or should it be enforced at the set_attributes? I can see good arguments for both.

I'm leaning towards enforcing it at the flattening of the kwargs. In this case, I would stringify anything that wasn't an AttributeValue in the kwargs dict before passing it as an attribute.

It might also be worth adding some error catching so that an error in the tracing code doesn't result in the function not being called... but I'm not sure where that would go.

spott avatar Dec 11 '24 17:12 spott

Let me try and show what I've found. I don't think a minimum working example is necessary, though it shouldn't be that hard to make one.

When you patch instructor, the patched async function here is called (there is a similar function for sync functions as well).

here that function calls start_as_current_span. However, notice that it takes the kwargs of the function, and flattens them and passes it as a dict as attributes for the span.

After a few more layers of indirection, we get to here, the _WrappedSpan class. This class has two functions, set_attributes which accepts dicts of AttributeValues, and set_attribute which accepts callables that return AttributeValue or an AttributeValue.

AttributeValue looks like this:

AttributeValue = Union[
    str,
    bool,
    int,
    float,
    Sequence[str],
    Sequence[bool],
    Sequence[int],
    Sequence[float],
]

set_attribute calls a mask function that, at the end, checks if the value is callable, and if it is, calls it with no arguments.

And there is where the problem lies. The function that this whole thing is wrapping (client.chat.completions.create in this case), takes as an argument a response_model... which is a Pydantic model. This isn't a AttributeValue, and when mask tries to call it with no arguments, it tries to construct the Pydantic object, and if that object has required fields, it throws.

Ultimately the problem is that the types that set_attributes accepts aren't enforced, and the flattening of the kwargs here passes non-AttributeValue values along the chain.

So, I'm happy to submit a pull request... but I'm not sure where the contract for AttributeValue should be enforced. Should it be enforced at the flattening of the kwargs? or should it be enforced at the set_attributes? I can see good arguments for both.

I'm leaning towards enforcing it at the flattening of the kwargs. In this case, I would stringify anything that wasn't an AttributeValue in the kwargs dict before passing it as an attribute.

It might also be worth adding some error catching so that an error in the tracing code doesn't result in the function not being called... but I'm not sure where that would go.

Thanks for the detailed feedback. We'll get this prioritized. Appreciate it.

mikeldking avatar Dec 12 '24 16:12 mikeldking

I also am using instructor currently, and when using Instructor + OpenAI or Instructor + Groq, it crashes my app. Removing Instructor Instrumentation makes it work, so this needs to be updated!

exiao avatar Jan 30 '25 18:01 exiao

Thanks for the follow up @exiao -- I'll take a look later this week.

nate-mar avatar Feb 04 '25 21:02 nate-mar

I believe this is fixed, though I'm not sure. Please re-open if not

mikeldking avatar May 23 '25 01:05 mikeldking