nx Option to skip terminal output of cached tasks

[ ] I'd be willing to implement this feature (contributing guide)

Description

It'd be nice to have an option to skip the terminal output of cached tasks.

Motivation

Some tasks generate thousands of lines of logs and especially in CI this can take quite some time to finish. It'd be useful to skip these logs entierly for cached tasks.

Suggested Implementation

This could also be an option in the nx.json configuration file.

I've searched the docs and GitHub issues and discussions, but I don't think this exists. If it does, please forgive me for not finding it :D

Oct 20 '23 12:10 TommasoAmici

Hi @chandlj ,

Thanks for the question!

You can indeed specify a list type formatting for the dspy.OutputField of the Signature. To a generate a specified number of responses within the outputs, you can mention this within the instruction and/or introduce an dspy.InputField that takes in the user-expected number of outputted responses. For example,

class BasicQA(dspy.Signature):
    """Return a list of specified number of possible answers to the question."""

    question = dspy.InputField()
    number = dspy.InputField(desc="number of possible answers to return")
    answer = dspy.OutputField(format=list, desc="unique possible answers")`

class TestModule(dspy.Module):
    def __init__(self):
        super().__init__()
        self.generate_answer = dspy.ChainOfThought(BasicQA)
    
    def forward(self, question, number):
        prediction = self.generate_answer(question=question, number=number)
        return dspy.Prediction(answer=prediction.answer)

From here, you can specify your inputs alongside a number of expected outputs and produce a list of expected responses.

Pro tip - since we are imposing this constraint on the LLM itself, they are subject to errors in generating more/less than the expected number (or even duplicated answers), and here's where you can make use of Suggest and/or Assert to check for the list's length matching your InputField number.

Let me know if this helps!

Jan 31 '24 18:01 arnavsinghvi11

@arnavsinghvi11 Thanks for the help! I've noticed that this somewhat starts breaking down once you add more fields that are each dependent on each other. For example:

class BasicQA(dspy.Signature):
    """Return a list of specified number of possible answers to the question."""

    question = dspy.InputField()
    number = dspy.InputField(desc="number of possible answers to return")
    answer = dspy.OutputField(format=list, desc="unique possible answers")
    options = dspy.OutputField(format=list, desc="For each answer, a list of four possible options labeled A, B, C, and D.")

Using it in TestModule()(question="Who was a president of the United States?", number="5") results in a proper list of answers but a malformed options list. As the "schema" per se gets more complex I've noticed that DSPy has a harder time enforcing format. I'm sure that with few-shot examples this would be a bit better, but this is something LangChain handles pretty cleanly out-of-the-box. Basically, I'm looking for the optimizations and prompt-building that DSPy provides with some of the stronger output parsing and type-enforcing that something like Outlines or LangChain provides.

It would potentially be helpful to specify "composite signatures", like below, in order to build more complex signatures and types. The implicit schema here is that the LLM would respond with a list of BasicQA signatures:

class ListOfBasicQA(dspy.Signature):
     question = dspy.InputField(...)
     outputs = dspy.OutputField(format=list, base_signature=BasicQA)

Jan 31 '24 18:01 chandlj

Hey @chandlj ,

This is a great point, I'm also interested in "the optimizations that DSPy provides with stronger output parsing and type-enforcing".

I think LangChain et al provide this now through function calling APIs? These are quite restrictive in general. We'd rather use prompt -> completion interfaces. I think the best projects in this sphere would be SGLang or Outlines. We'd been wanting to allow DSPy signatures to have multiple backends, like SGLang or Outlines (or even function calling for that matter).

That's something we need help to do, let us know if you want to explore it.

Feb 01 '24 18:02 okhat

@okhat Yes, I believe that LangChain has the option for using OpenAI's function calling API, but I don't know if it uses it as the default. In my experience I don't think it does, because passing in a Json formatter's parse instructions can still return malformed JSON unless you enforce the model to be in JSON Mode, which is obviously not a universal thing (although the models are still pretty good at returning JSON when you don't specify JSON mode explicitly).

I could help explore this. How are Signatures currently handling input/output parsing? My understanding is that Signatures get baked into the prompt, but outside of the prompt there isn't a ton of enforcing? Let me know what your progress in this space has been.

Feb 02 '24 15:02 chandlj

Dear @okhat

I found out that DsPy has recently added Typed Predictors which resolves this issue. However, I tried using this new feature with a more complex structure like lists, and I am encountering this error. I was wondering if there is a solution for this.

from pydantic import BaseModel, Field

class Output(BaseModel):
    question: str = Field(...)
    answer: str = Field(...)

class ListModel(BaseModel):
    outputs: list[Output] = Field(...)

class ListGenerator(dspy.Signature):
    input: str = dspy.InputField()
    output: ListModel = dspy.OutputField()

predictor = dspy.TypedChainOfThought(ListGenerator)
prediction = predictor(input = "Some String ...")

The error I get is as follows:

ValueError: ('Too many retries trying to get the correct output format. Try simplifying the requirements.', {'output': 'ValueError("Don\'t write anything after the final json 
")'})

Apr 16 '24 12:04 kimianoorbakhsh

@kimianoorbakhsh I have encountered a similar issue when using Typed Predictors. What helped for me is switching the language models (to something more "capable") and simplifying the complexity of the outputs. This could mean, for example, adding the "desc" parameter in Field(...) to add context or deleting too many output fields.

However, this has been trial-and-error and I would also be very interested to know if there is a different resolution to this!

Apr 17 '24 09:04 mlederbauer

Does dspy use json mode or function calling? Can dspy automatically add multi shot examples for the schema? Also for open weight models an sglang backend would solve this as well.

Apr 17 '24 21:04 AriMKatz

nx nx copied to clipboard

Option to skip terminal output of cached tasks

Description

Motivation

Suggested Implementation

nx
nx copied to clipboard