Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Define json schema (file format) for message tree exports

Open andreaskoepf opened this issue 2 years ago • 1 comments
trafficstars

We want to export the human-feedback ranking results for training of the reward model. Our database contains message-trees, e.g. see our high level data structure overview document. We have ranking information among child nodes on all levels.

We need to define a file format for export form the database (e.g. JSON) that can be used as input to the RM model training stage.

As a starting point we could use a format similar to OpenAI's learning to summarize dataset. The main difference to the summarize data is that we have in general more than two completions for each prompt. This full ranking of N elements can be used to generate all possible ranked pairs for cross-entropy (see Instruct GPT paper appendix "C.2 Details of RM training").

(Overfitting is a serious problem of reward models. We have to see if we can simply put multiple threads of a message-tree randomly.) It would be great if someone working on the RMs could help here to define a suitable file structure (also to write and test corresponding import code).

andreaskoepf avatar Dec 30 '22 00:12 andreaskoepf

@andreaskoepf

{
    "context": {
        "text": "the context of the interaction(chat) or it can be history text message"
    },
    "prompt": {
        "text": "Prompt input for the model"
    },
    "supplemental": {
        "texts": [
            "supplement paragraph 1",
            "supplement paragraph 2"
        ]
    },
    "labels": {
        "rank": {
            "0": ["best response"],
            "1": ["this response 2", "this response is tie with response 2"],
            "2": ["response 4"]
        }
    }
}

I assume the rank dataset is going to be a hard rank ( no score assigned to each of the response ). But if we had support for response labeling ( assigning score, labels for correct, violate our rule ), then I suppose the response can be a dictionary instead.

{ ...
  "labels": {
        "rank": {
            "0": [{ "text": "best response", "score": 0.5, "label 1": 0, "label 2": 1 }],
            ...
        }
    }
}

theblackcat102 avatar Dec 30 '22 02:12 theblackcat102