Auto-UI icon indicating copy to clipboard operation
Auto-UI copied to clipboard

any inference code or something to check the model

Open Occupying-Mars opened this issue 1 year ago • 6 comments

Occupying-Mars avatar Nov 16 '23 08:11 Occupying-Mars

After some investigation, I replicated inference code using the same goal with supplied one/multiple snapshots and history actions. It is not working very well on zero-shot situations.

truebit avatar Nov 23 '23 13:11 truebit

@truebit Can you publish your inference code? I would appreciate it!

YiDa858 avatar Dec 25 '23 01:12 YiDa858

@truebit Please share the inference code if possible.

kirtishrinkhala avatar Jan 09 '24 00:01 kirtishrinkhala

I have been working on writing the inference code, here is what I could achieve till now. I wrote a function to produce the processed input for an image and the goal. However, now I am not sure on how to use that as an input on a pretrained model.

This is the code that I wrote to process the image file and the goal:

`

import action_type, action_matching
import tensorflow as tf
import numpy as np
from tqdm import tqdm
import json
import jax.numpy as jnp
import argparse
import pickle
import torch
import tensorflow as tf
from PIL import Image
from transformers import AutoProcessor, Blip2Model

device = "cuda" if torch.cuda.is_available() else "cpu"
model = Blip2Model.from_pretrained("Salesforce/blip2-opt-2.7b", torch_dtype=torch.float16)
model.to(device)
processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")

def parse_image(
    image_file_path
):

    goal = "How to login?"
    step_id = "123"
        # episode_id = ex.features.feature['episode_id'].bytes_list.value[0].decode('utf-8')
    output_ep = {
        "goal": goal,
        "step_id": step_id
    }

    img = Image.open('sample.png')


    image_height = img.height
    image_height
    image_width = img.width
#     image_channels = img.getChannel()
    with torch.no_grad():
        inputs = processor(images=img, return_tensors="pt").to(device, torch.float16)
        image_features = model.get_image_features(**inputs).pooler_output[0]
        image_features = image_features.detach().cpu()
    output_ep["image"] = image_features
    output = []
    output.append(output_ep)
    parsed_episode = []
    parsed_episode.append({"episode_id":123, "data":output})
    return parsed_episode


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--dataset', type=str, default='general')
    parser.add_argument("--split_file", type=str, default="dataset/general_texts_splits.json")
    parser.add_argument('--output_dir', type=str, default='dataset')
    parser.add_argument('--get_images', default=True, action='store_true')
    parser.add_argument('--get_annotations', default=True, action='store_true')
    parser.add_argument('--get_actions', default=True, action='store_true')
    parser.add_argument('--file_path', type=str, default='sample.png')
    
    args = parser.parse_args()
    return args

if __name__ == '__main__':

    args = parse_args()
    print('====Input Arguments====')
    print(json.dumps(vars(args), indent=2, sort_keys=False))

    all_parsed_episode = parse_image(args.file_path)

    with open(f"{args.output_dir}_test_val.obj", "wb") as wp:
        pickle.dump(all_parsed_episode,wp)

`

kirtishrinkhala avatar Jan 09 '24 23:01 kirtishrinkhala

Hi friends,

We’ve got AutoUI running and tested its end-to-end performance in our recent paper. You can find the inference code here

https://github.com/Berkeley-NLP/Agent-Eval-Refine/tree/main/exps/android_exp/models/Auto-UI

Jiayi-Pan avatar Apr 09 '24 03:04 Jiayi-Pan

Hi friends,

We’ve got AutoUI running and tested its end-to-end performance in our recent paper. You can find the inference code here

https://github.com/Berkeley-NLP/Agent-Eval-Refine/tree/main/exps/android_exp/models/Auto-UI

Great job thanks will try that 👍 any insights in how good it works for zero shot approaches?

Yingrjimsch avatar Apr 30 '24 20:04 Yingrjimsch