weave icon indicating copy to clipboard operation
weave copied to clipboard

[Question] Is there a way to turn weave objects (tracetables, boxed types, etc....) back into the original python objects?

Open darinkishore opened this issue 9 months ago • 5 comments

Hi! So I have a weave object, initialized as follows:

class SemanticMemoryExample(BaseModel):
    name: str
    text: str
    memory: str
    inputs: list[str] = ["name", "text"]


class SemanticMemoryExampleDataset(weave.Object, BaseModel):
    name: str = "semantic_memory_perltmem_dspy_first_examples"
    description: str = "First foray into semantic memory using DSPY"
    examples: list[SemanticMemoryExample]

The wrapper "SemanticMemoryExampleDataset" is being used because name is an already defined attribute for weave.Object.

I'd like to save and load this object, so I run

def publish_dataset(
    examples: list[SemanticMemoryExample],
) -> SemanticMemoryExampleDataset:
    dataset = SemanticMemoryExampleDataset(examples=examples)
    weave.publish(dataset)
    return dataset
def retrieve_examples(
    dataset_ref: SemanticMemoryExampleDataset,
) -> list[SemanticMemoryExample]:
    retrieved_examples: list[SemanticMemoryExample] = []
    for example in dataset_ref.examples:
        retrieved_examples.append(example)  # here, this doesn't go recursively

    return retrieved_examples

However, retrieve_examples returns

TraceObject(ObjectRecord({'name': BoxedStr('...'), 'text': BoxedStr("...."), 'memory': BoxedStr('...'), 'inputs': TraceList(['name', 'text']), '_class_name': 'SemanticMemoryExample', '_bases': TraceList(['BaseModel']), 'map_values': <bound method ObjectRecord.map_values of ObjectRecord({...})>}))

I see that Boxed objects have an unbox() method, so I can unbox the name, text, and memory by calling from weave.box import unbox and unbox(thing) for thing in list

But I don't see a way to convert inputs, a TraceList back into a native python datatype. Also, the manually converting everything is a hassle—is there a weave function planned (or already existing) that turns the TraceList, TraceTable, etc... objects back into native python datatypes?

Loving the library—there were a LOT of good decisions made as far as what to focus on and DX. This is an almost ideal solution for me.

darinkishore avatar May 04 '24 00:05 darinkishore

I'm being silly—You can just cast it back into a list, inputs=list(example.inputs).

darinkishore avatar May 04 '24 00:05 darinkishore

Hi @darinkishore just wanted to confirm that all is good here and that we can close this?

jwlee64 avatar May 07 '24 00:05 jwlee64

Hi! Thank you for checking—My main question is still unresolved!

Can you turn weave objects back into their native python objects?

Usually to preserve state, some classes can't set everything up at creation time!

Also, the changed type of all inside attributes is inconvenient to keep track of and work around in code, especially if I use lots of different weave objects.

darinkishore avatar May 08 '24 01:05 darinkishore

Hi @darinkishore - thank you very much for your feedback and comments. Your request is very reasonable. Reading your use case, I am extracting 3 distinct asks:

  1. The ability to construct the original runtime class when loading published data
  2. Ideally Trace*, *Record, and Boxed* type classes are transparent to the user as they deviate from the expected types in code (at the very least it should be easy to recursively strip away this representation)
  3. (Implied from first comment): The special name field in our Object class can conflict with user-defined fields.

Spitballing some API ideas: I wonder if there could be a higher level class method that could make this easier (some pseudo code):

class Object():
   # ...

   @classmethod
   def load(cls, data: "Object" | dict | TraceObject) -> "Object":
      """
      """
      if isinstance(data, cls):
         return data
      elif isinstance(data, dict):
         return cls.model_validate(dict)
      elif isinstance(data, TraceObject):
         return cls.load(weave.unwrap(data))
      else:
          raise

this would allow you to run SemanticMemoryExampleDataset.load(...) to ensure you have the right class.


In any case, these are great requests and we need to think about a good design to improve this. Probably need to come back with more ideas/options before taking action

tssweeney avatar May 20 '24 18:05 tssweeney

Internal backlog link: https://wandb.atlassian.net/browse/WB-18889

tssweeney avatar May 20 '24 19:05 tssweeney