JARVIS
JARVIS copied to clipboard
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Hello,
When running in inference mode, I got this error message. Here is the log of the interaction. Any suggestions appreciated.
2023-04-11 10:47:21,867 - awesome_chat - INFO - input: For the image at location /images/example_page.jpg please draw a bounding box around each block of text in the image.
2023-04-11 10:47:21,871 - awesome_chat - DEBUG - [{'role': 'system', 'content': '#1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id": task_id, "dep": dependency_task_id, "args": {"text": text or <GENERATED>-dep_id, "image": image_url or <GENERATED>-dep_id, "audio": audio_url or <GENERATED>-dep_id}}]. The special tag "<GENERATED>-dep_id" refer to the one generated text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The "args" field must in ["text", "image", "audio"], nothing else. The task MUST be selected from the following options: "token-classification", "text2text-generation", "summarization", "translation", "question-answering", "conversational", "text-generation", "sentence-similarity", "tabular-classification", "object-detection", "image-classification", "image-to-image", "image-to-text", "text-to-image", "text-to-video", "visual-question-answering", "document-question-answering", "image-segmentation", "depth-estimation", "text-to-speech", "automatic-speech-recognition", "audio-to-audio", "audio-classification", "canny-control", "hed-control", "mlsd-control", "normal-control", "openpose-control", "canny-text-to-image", "depth-text-to-image", "hed-text-to-image", "mlsd-text-to-image", "normal-text-to-image", "openpose-text-to-image", "seg-text-to-image". There may be multiple tasks of the same type. Think step by step about all the tasks needed to resolve the user\'s request. Parse out as few tasks as possible while ensuring that the user request can be resolved. Pay attention to the dependencies and order among tasks. If the user input can\'t be parsed, you need to reply empty JSON [].'}, {'role': 'user', 'content': 'Give you some pictures e1.jpg, e2.png, e3.jpg, help me count the number of sheep?'}, {'role': 'assistant', 'content': '[{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "e1.jpg" }}, {"task": "object-detection", "id": 1, "dep": [-1], "args": {"image": "e1.jpg" }}, {"task": "visual-question-answering", "id": 2, "dep": [1], "args": {"image": "<GENERATED>-1", "text": "How many sheep in the picture"}} }}, {"task": "image-to-text", "id": 3, "dep": [-1], "args": {"image": "e2.png" }}, {"task": "object-detection", "id": 4, "dep": [-1], "args": {"image": "e2.png" }}, {"task": "visual-question-answering", "id": 5, "dep": [4], "args": {"image": "<GENERATED>-4", "text": "How many sheep in the picture"}} }}, {"task": "image-to-text", "id": 6, "dep": [-1], "args": {"image": "e3.jpg" }}, {"task": "object-detection", "id": 7, "dep": [-1], "args": {"image": "e3.jpg" }}, {"task": "visual-question-answering", "id": 8, "dep": [7], "args": {"image": "<GENERATED>-7", "text": "How many sheep in the picture"}}]'}, {'role': 'user', 'content': 'Look at /e.jpg, can you tell me how many objects in the picture? Give me a picture and video similar to this one.'}, {'role': 'assistant', 'content': '[{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "/e.jpg" }}, {"task": "object-detection", "id": 1, "dep": [-1], "args": {"image": "/e.jpg" }}, {"task": "visual-question-answering", "id": 2, "dep": [1], "args": {"image": "<GENERATED>-1", "text": "how many objects in the picture?" }}, {"task": "text-to-image", "id": 3, "dep": [0], "args": {"text": "<GENERATED-0>" }}, {"task": "image-to-image", "id": 4, "dep": [-1], "args": {"image": "/e.jpg" }}, {"task": "text-to-video", "id": 5, "dep": [0], "args": {"text": "<GENERATED-0>" }}]'}, {'role': 'user', 'content': 'given a document /images/e.jpeg, answer me what is the student amount? And describe the image with your voice'}, {'role': 'assistant', 'content': '{"task": "document-question-answering", "id": 0, "dep": [-1], "args": {"image": "/images/e.jpeg", "text": "what is the student amount?" }}, {"task": "visual-question-answering", "id": 1, "dep": [-1], "args": {"image": "/images/e.jpeg", "text": "what is the student amount?" }}, {"task": "image-to-text", "id": 2, "dep": [-1], "args": {"image": "/images/e.jpg" }}, {"task": "text-to-speech", "id": 3, "dep": [2], "args": {"text": "<GENERATED>-2" }}]'}, {'role': 'user', 'content': 'Given an image /example.jpg, first generate a hed image, then based on the hed image generate a new image where a girl is reading a book'}, {'role': 'assistant', 'content': '[{"task": "openpose-control", "id": 0, "dep": [-1], "args": {"image": "/example.jpg" }}, {"task": "openpose-text-to-image", "id": 1, "dep": [0], "args": {"text": "a girl is reading a book", "image": "<GENERATED>-0" }}]'}, {'role': 'user', 'content': "please show me a video and an image of (based on the text) 'a boy is running' and dub it"}, {'role': 'assistant', 'content': '[{"task": "text-to-video", "id": 0, "dep": [-1], "args": {"text": "a boy is running" }}, {"task": "text-to-speech", "id": 1, "dep": [-1], "args": {"text": "a boy is running" }}, {"task": "text-to-image", "id": 2, "dep": [-1], "args": {"text": "a boy is running" }}]'}, {'role': 'user', 'content': 'please show me a joke and an image of cat'}, {'role': 'assistant', 'content': '[{"task": "conversational", "id": 0, "dep": [-1], "args": {"text": "please show me a joke of cat" }}, {"task": "text-to-image", "id": 1, "dep": [-1], "args": {"text": "a photo of cat" }}]'}, {'role': 'user', 'content': "The chat log [ [{'role': 'user', 'content': 'Please set your OpenAI API key first.'}, {'role': 'assistant', 'content': 'To answer your request, you need to set your OpenAI API key first. To do this, you must create an OpenAI account and then find your API key on the 'settings' page of your account and insert it into the required fields. The model used for this task is ChatGPT. There are no generated files of images, audios or videos in the inference results. Is there anything else I can help you with?'}] ] may contain the resources I mentioned. Now I input { For the image at location /images/example_page.jpg please draw a bounding box around each block of text in the image. }. Pay attention to the input and output types of tasks and the dependencies between tasks."}]
2023-04-11 10:47:26,537 - awesome_chat - DEBUG - {"id":"cmpl-74AVFSQM0q1urf67qfyueY86qflFp","object":"text_completion","created":1681228041,"model":"text-davinci-003","choices":[{"text":"\n[{\"task\": \"image-to-text\", \"id\": 0, \"dep\": [-1], \"args\": {\"image\": \"/images/example_page.jpg\" }}, {\"task\": \"text-to-image\", \"id\": 1, \"dep\": [0], \"args\": {\"text\": \"<GENERATED>-0\" }}, {\"task\": \"object-detection\", \"id\": 2, \"dep\": [-1], \"args\": {\"image\": \"/images/example_page.jpg\" }}]","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":2052,"completion_tokens":114,"total_tokens":2166}}
2023-04-11 10:47:26,537 - awesome_chat - INFO - [{"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "/images/example_page.jpg" }}, {"task": "text-to-image", "id": 1, "dep": [0], "args": {"text": "<GENERATED>-0" }}, {"task": "object-detection", "id": 2, "dep": [-1], "args": {"image": "/images/example_page.jpg" }}]
2023-04-11 10:47:26,537 - awesome_chat - DEBUG - [{'task': 'image-to-text', 'id': 0, 'dep': [-1], 'args': {'image': '/images/example_page.jpg'}}, {'task': 'text-to-image', 'id': 1, 'dep': [0], 'args': {'text': '<GENERATED>-0'}}, {'task': 'object-detection', 'id': 2, 'dep': [-1], 'args': {'image': '/images/example_page.jpg'}}]
2023-04-11 10:47:26,537 - awesome_chat - DEBUG - Run task: 0 - image-to-text
2023-04-11 10:47:26,538 - awesome_chat - DEBUG - Deps: []
2023-04-11 10:47:26,538 - awesome_chat - DEBUG - Run task: 2 - object-detection
2023-04-11 10:47:26,538 - awesome_chat - DEBUG - parsed task: {'task': 'image-to-text', 'id': 0, 'dep': [-1], 'args': {'image': 'public//images/example_page.jpg'}}
2023-04-11 10:47:26,539 - awesome_chat - DEBUG - Deps: []
2023-04-11 10:47:26,539 - awesome_chat - DEBUG - parsed task: {'task': 'object-detection', 'id': 2, 'dep': [-1], 'args': {'image': 'public//images/example_page.jpg'}}
2023-04-11 10:47:26,771 - awesome_chat - DEBUG - avaliable models on image-to-text: {'local': ['nlpconnect/vit-gpt2-image-captioning'], 'huggingface': ['nlpconnect/vit-gpt2-image-captioning', 'Salesforce/blip2-opt-2.7b']}
2023-04-11 10:47:26,773 - awesome_chat - DEBUG - [{'role': 'system', 'content': '#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.'}, {'role': 'user', 'content': 'For the image at location /images/example_page.jpg please draw a bounding box around each block of text in the image. '}, {'role': 'assistant', 'content': "{'task': 'image-to-text', 'id': 0, 'dep': [-1], 'args': {'image': 'public//images/example_page.jpg'}}"}, {'role': 'user', 'content': 'Please choose the most suitable model from [{\'id\': \'nlpconnect/vit-gpt2-image-captioning\', \'inference endpoint\': [\'nlpconnect/vit-gpt2-image-captioning\'], \'likes\': 219, \'description\': \'\\n\\n# nlpconnect/vit-gpt2-image-captioning\\n\\nThis is an image captioning model trained by @ydshieh in [\', \'tags\': [\'image-to-text\', \'image-captioning\']}, {\'id\': \'Salesforce/blip2-opt-2.7b\', \'inference endpoint\': [\'nlpconnect/vit-gpt2-image-captioning\', \'Salesforce/blip2-opt-2.7b\'], \'likes\': 25, \'description\': \'\\n\\n# BLIP-2, OPT-2.7b, pre-trained only\\n\\nBLIP-2 model, leveraging [OPT-2.7b](https://huggingface.co/f\', \'tags\': [\'vision\', \'image-to-text\', \'image-captioning\', \'visual-question-answering\']}] for the task {\'task\': \'image-to-text\', \'id\': 0, \'dep\': [-1], \'args\': {\'image\': \'public//images/example_page.jpg\'}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.'}]
2023-04-11 10:47:26,818 - awesome_chat - DEBUG - avaliable models on object-detection: {'local': ['facebook/detr-resnet-101', 'google/owlvit-base-patch32'], 'huggingface': ['facebook/detr-resnet-50', 'facebook/detr-resnet-101', 'hustvl/yolos-small']}
2023-04-11 10:47:26,819 - awesome_chat - DEBUG - [{'role': 'system', 'content': '#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.'}, {'role': 'user', 'content': 'For the image at location /images/example_page.jpg please draw a bounding box around each block of text in the image. '}, {'role': 'assistant', 'content': "{'task': 'object-detection', 'id': 2, 'dep': [-1], 'args': {'image': 'public//images/example_page.jpg'}}"}, {'role': 'user', 'content': 'Please choose the most suitable model from [{\'id\': \'facebook/detr-resnet-50\', \'inference endpoint\': [\'facebook/detr-resnet-50\', \'facebook/detr-resnet-101\', \'hustvl/yolos-small\'], \'likes\': 129, \'description\': \'\\n\\n# DETR (End-to-End Object Detection) model with ResNet-50 backbone\\n\\nDEtection TRansformer (DETR) m\', \'tags\': [\'object-detection\', \'vision\']}, {\'id\': \'facebook/detr-resnet-101\', \'inference endpoint\': [\'facebook/detr-resnet-101\', \'google/owlvit-base-patch32\'], \'likes\': 30, \'description\': \'\\n\\n# DETR (End-to-End Object Detection) model with ResNet-101 backbone\\n\\nDEtection TRansformer (DETR) \', \'tags\': [\'object-detection\', \'vision\']}, {\'id\': \'google/owlvit-base-patch32\', \'inference endpoint\': [\'facebook/detr-resnet-101\', \'google/owlvit-base-patch32\'], \'likes\': 30, \'description\': \'\\n\\n# Model Card: OWL-ViT\\n\\n## Model Details\\n\\nThe OWL-ViT (short for Vision Transformer for Open-World \', \'tags\': [\'vision\', \'object-detection\']}, {\'id\': \'hustvl/yolos-small\', \'inference endpoint\': [\'facebook/detr-resnet-50\', \'facebook/detr-resnet-101\', \'hustvl/yolos-small\'], \'likes\': 14, \'description\': \'\\n\\n# YOLOS (small-sized) model\\n\\nYOLOS model fine-tuned on COCO 2017 object detection (118k annotated \', \'tags\': [\'object-detection\', \'vision\']}] for the task {\'task\': \'object-detection\', \'id\': 2, \'dep\': [-1], \'args\': {\'image\': \'public//images/example_page.jpg\'}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.'}]
2023-04-11 10:47:28,742 - awesome_chat - DEBUG - {"id":"cmpl-74AVKd7YA7NJXTKJeM5XYrnJxWPjG","object":"text_completion","created":1681228046,"model":"text-davinci-003","choices":[{"text":"\n{\"id\": \"facebook/detr-resnet-50\", \"reason\": \"This model is best suited for the task of object detection as it has a ResNet-50 backbone and is specifically designed for this task. It also has the highest number of likes and is the most popular model for this task\"}","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":719,"completion_tokens":65,"total_tokens":784}}
2023-04-11 10:47:28,742 - awesome_chat - DEBUG - chosen model: {"id": "facebook/detr-resnet-50", "reason": "This model is best suited for the task of object detection as it has a ResNet-50 backbone and is specifically designed for this task. It also has the highest number of likes and is the most popular model for this task"}
2023-04-11 10:47:29,122 - awesome_chat - DEBUG - {"id":"cmpl-74AVKphM9UzND6d6pxb7KeKAGOsVv","object":"text_completion","created":1681228046,"model":"text-davinci-003","choices":[{"text":"\n{\"id\": \"nlpconnect/vit-gpt2-image-captioning\", \"reason\": \"This model is specifically designed for image-to-text tasks and has a local inference endpoint for speed and stability\"}","index":0,"logprobs":null,"finish_reason":null}],"usage":{"prompt_tokens":539,"completion_tokens":49,"total_tokens":588}}
2023-04-11 10:47:29,123 - awesome_chat - DEBUG - chosen model: {"id": "nlpconnect/vit-gpt2-image-captioning", "reason": "This model is specifically designed for image-to-text tasks and has a local inference endpoint for speed and stability"}
2023-04-11 10:47:29,137 - awesome_chat - WARNING - Inference error: {'message': 'Expecting value: line 1 column 1 (char 0)'}
2023-04-11 10:47:29,137 - awesome_chat - DEBUG - inference result: {'error': {'message': 'Expecting value: line 1 column 1 (char 0)'}}
2023-04-11 10:47:29,542 - awesome_chat - DEBUG - Run task: 1 - text-to-image
2023-04-11 10:47:29,543 - awesome_chat - DEBUG - Deps: [{"task": {"task": "image-to-text", "id": 0, "dep": [-1], "args": {"image": "public//images/example_page.jpg"}}, "inference result": {"error": {"message": "Expecting value: line 1 column 1 (char 0)"}}, "choose model result": {"id": "nlpconnect/vit-gpt2-image-captioning", "reason": "This model is specifically designed for image-to-text tasks and has a local inference endpoint for speed and stability"}}]
2023-04-11 10:47:29,543 - awesome_chat - DEBUG - Detect the image of dependency task (from args): public//images/example_page.jpg
2023-04-11 10:47:29,543 - awesome_chat - DEBUG - parsed task: {'task': 'text-to-image', 'id': 1, 'dep': [0], 'args': {'text': '<GENERATED>-0'}}
2023-04-11 10:47:29,809 - awesome_chat - DEBUG - avaliable models on text-to-image: {'local': ['runwayml/stable-diffusion-v1-5'], 'huggingface': ['runwayml/stable-diffusion-v1-5', 'hakurei/waifu-diffusion', 'prompthero/openjourney', 'stabilityai/stable-diffusion-2-1']}
2023-04-11 10:47:29,809 - awesome_chat - DEBUG - [{'role': 'system', 'content': '#2 Model Selection Stage: Given the user request and the parsed tasks, the AI assistant helps the user to select a suitable model from a list of models to process the user request. The assistant should focus more on the description of the model and find the model that has the most potential to solve requests and tasks. Also, prefer models with local inference endpoints for speed and stability.'}, {'role': 'user', 'content': 'For the image at location /images/example_page.jpg please draw a bounding box around each block of text in the image. '}, {'role': 'assistant', 'content': "{'task': 'text-to-image', 'id': 1, 'dep': [0], 'args': {'text': '<GENERATED>-0'}}"}, {'role': 'user', 'content': 'Please choose the most suitable model from [{\'id\': \'runwayml/stable-diffusion-v1-5\', \'inference endpoint\': [\'runwayml/stable-diffusion-v1-5\'], \'likes\': 6367, \'description\': \'\\n\\n# Stable Diffusion v1-5 Model Card\\n\\nStable Diffusion is a latent text-to-image diffusion model cap\', \'tags\': [\'stable-diffusion\', \'stable-diffusion-diffusers\', \'text-to-image\']}, {\'id\': \'prompthero/openjourney\', \'inference endpoint\': [\'runwayml/stable-diffusion-v1-5\', \'hakurei/waifu-diffusion\', \'prompthero/openjourney\', \'stabilityai/stable-diffusion-2-1\'], \'likes\': 2060, \'description\': \'\\n# Openjourney is an open source Stable Diffusion fine tuned model on Midjourney images, by [PromptH\', \'tags\': [\'stable-diffusion\', \'text-to-image\']}, {\'id\': \'hakurei/waifu-diffusion\', \'inference endpoint\': [\'runwayml/stable-diffusion-v1-5\', \'hakurei/waifu-diffusion\', \'prompthero/openjourney\', \'stabilityai/stable-diffusion-2-1\'], \'likes\': 1900, \'description\': \'\\n\\n# waifu-diffusion v1.4 - Diffusion for Weebs\\n\\nwaifu-diffusion is a latent text-to-image diffusion \', \'tags\': [\'stable-diffusion\', \'text-to-image\']}, {\'id\': \'stabilityai/stable-diffusion-2-1\', \'inference endpoint\': [\'runwayml/stable-diffusion-v1-5\', \'hakurei/waifu-diffusion\', \'prompthero/openjourney\', \'stabilityai/stable-diffusion-2-1\'], \'likes\': 1829, \'description\': \'\\n\\n# Stable Diffusion v2-1 Model Card\\nThis model card focuses on the model associated with the Stable\', \'tags\': [\'stable-diffusion\', \'text-to-image\']}] for the task {\'task\': \'text-to-image\', \'id\': 1, \'dep\': [0], \'args\': {\'text\': \'<GENERATED>-0\'}}. The output must be in a strict JSON format: {"id": "id", "reason": "your detail reasons for the choice"}.'}]
2023-04-11 10:47:30,493 - awesome_chat - DEBUG - inference result: {'generated image with predicted box': '/images/25de.jpg', 'predicted': []}
2023-04-11 10:47:32,081 - awesome_chat - DEBUG - {"id":"cmpl-74AVN0TqSS4BOvD1A5WzqnZ0LIVgN","object":"text_completion","created":1681228049,"model":"text-davinci-003","choices":[{"text":"\n{\"id\": \"runwayml/stable-diffusion-v1-5\", \"reason\": \"This model has the most potential to solve the text-to-image task as it has the highest number of likes and is the most popular model for this task. It also has a local inference endpoint which will provide speed and stability\"}","index":0,"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":779,"completion_tokens":70,"total_tokens":849}}
2023-04-11 10:47:32,081 - awesome_chat - DEBUG - chosen model: {"id": "runwayml/stable-diffusion-v1-5", "reason": "This model has the most potential to solve the text-to-image task as it has the highest number of likes and is the most popular model for this task. It also has a local inference endpoint which will provide speed and stability"}
Here is the cli output:
python run_gradio_demo.py --config config.gradio.yaml
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Expecting value: line 1 column 1 (char 0)
Traceback (most recent call last):
File "/home/matt/anaconda2/envs/jarvis/lib/python3.8/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/home/matt/anaconda2/envs/jarvis/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/home/matt/anaconda2/envs/jarvis/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/matt/anaconda2/envs/jarvis/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matt/programming/JARVIS/server/awesome_chat.py", line 605, in model_inference
inference_result = local_model_inference(model_id, data, task)
File "/home/matt/programming/JARVIS/server/awesome_chat.py", line 575, in local_model_inference
results = response.json()
File "/home/matt/anaconda2/envs/jarvis/lib/python3.8/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)