Workflow batch & api job requests fail when including a URL to a file

Open sigvardt opened this issue 1 year ago • 3 comments

Self Checks

[X] This is only for bug report, if you would like to ask a question, please head to Discussions.
[X] I have searched for existing issues search for existing issues, including closed ones.
[X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[X] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
[X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.11.1

Cloud or Self Hosted

Cloud

Steps to reproduce

Repro steps

Setup a workflow using the configuration below (or any workflow which accepts images basically)
Test that workflow works using a single job.
Try a batch workflow or send requests to the workflow using API, you will get the below errors

Workflow configuration file (DSL)

  description: Defines product image type and average image color as hex code value.
    Also write ALT text for the image.
  icon: 📸
  icon_background: '#FFEAD5'
  mode: workflow
  name: Image ALT text, type and hex code definer (Version 2 - Joakim)
  use_icon_as_answer_icon: false
kind: app
version: 0.1.3
workflow:
  conversation_variables: []
  environment_variables: []
  features:
    file_upload:
      allowed_file_extensions:
      - .JPG
      - .JPEG
      - .PNG
      - .GIF
      - .WEBP
      - .SVG
      allowed_file_types:
      - image
      allowed_file_upload_methods:
      - local_file
      - remote_url
      enabled: false
      fileUploadConfig:
        audio_file_size_limit: 50
        batch_count_limit: 5
        file_size_limit: 15
        image_file_size_limit: 10
        video_file_size_limit: 100
        workflow_file_upload_limit: 10
      image:
        enabled: false
        number_limits: 3
        transfer_methods:
        - local_file
        - remote_url
      number_limits: 1
    opening_statement: ''
    retriever_resource:
      enabled: false
    sensitive_word_avoidance:
      enabled: false
    speech_to_text:
      enabled: false
    suggested_questions: []
    suggested_questions_after_answer:
      enabled: false
    text_to_speech:
      enabled: false
      language: ''
      voice: ''
  graph:
    edges:
    - data:
        isInIteration: false
        sourceType: start
        targetType: if-else
      id: 1714920866750-source-1719391536361-target
      selected: false
      source: '1714920866750'
      sourceHandle: source
      target: '1719391536361'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: if-else
        targetType: llm
      id: 1719391536361-true-1728398212858-target
      source: '1719391536361'
      sourceHandle: 'true'
      target: '1728398212858'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: if-else
        targetType: llm
      id: 1719391536361-8a4f1fc0-397b-424e-a744-5c4090f82295-1728398041688-target
      source: '1719391536361'
      sourceHandle: 8a4f1fc0-397b-424e-a744-5c4090f82295
      target: '1728398041688'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: if-else
        targetType: end
      id: 1719391536361-false-1730886553319-target
      source: '1719391536361'
      sourceHandle: 'false'
      target: '1730886553319'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: end
      id: 1728398041688-source-1714920954869-target
      source: '1728398041688'
      sourceHandle: source
      target: '1714920954869'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: end
      id: 1728398212858-source-1719391856189-target
      source: '1728398212858'
      sourceHandle: source
      target: '1719391856189'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: llm
        targetType: end
      id: 1731321244320-source-1731321721145-target
      source: '1731321244320'
      sourceHandle: source
      target: '1731321721145'
      targetHandle: target
      type: custom
      zIndex: 0
    - data:
        isInIteration: false
        sourceType: if-else
        targetType: llm
      id: 1719391536361-2bd36163-37cb-407f-ad33-7c612bf2e791-1731321244320-target
      source: '1719391536361'
      sourceHandle: 2bd36163-37cb-407f-ad33-7c612bf2e791
      target: '1731321244320'
      targetHandle: target
      type: custom
      zIndex: 0
    nodes:
    - data:
        desc: ''
        selected: false
        title: Start
        type: start
        variables:
        - allowed_file_extensions: []
          allowed_file_types:
          - image
          allowed_file_upload_methods:
          - local_file
          - remote_url
          label: filetype
          max_length: 256
          options:
          - Product Image
          - Tech Logo
          - Manual
          - Energy Label
          - Data Sheet
          - Fiche
          - EU Tariff File
          - Battery Certificate
          - Safety Data Sheet
          required: true
          type: select
          variable: filetype
        - allowed_file_extensions: []
          allowed_file_types:
          - image
          allowed_file_upload_methods:
          - local_file
          - remote_url
          label: file
          max_length: 5
          options: []
          required: true
          type: file
          variable: file
        - label: en_nameWeb
          max_length: null
          options: []
          required: false
          type: text-input
          variable: en_nameWeb
        - label: no_nameWeb
          max_length: null
          options: []
          required: false
          type: text-input
          variable: no_nameWeb
        - label: se_nameWeb
          max_length: null
          options: []
          required: false
          type: text-input
          variable: se_nameWeb
        - label: fi_nameWeb
          max_length: null
          options: []
          required: false
          type: text-input
          variable: fi_nameWeb
        - label: dk_nameWeb
          max_length: null
          options: []
          required: false
          type: text-input
          variable: dk_nameWeb
      height: 246
      id: '1714920866750'
      position:
        x: -372.4019128777556
        y: 302.8844452803611
      positionAbsolute:
        x: -372.4019128777556
        y: 302.8844452803611
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        desc: ''
        outputs:
        - value_selector:
          - '1728398041688'
          - text
          variable: output
        selected: false
        title: Logo analysis output
        type: end
      height: 90
      id: '1714920954869'
      position:
        x: 875.2644922920333
        y: 449.42070507111214
      positionAbsolute:
        x: 875.2644922920333
        y: 449.42070507111214
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        cases:
        - case_id: 'true'
          conditions:
          - comparison_operator: is
            id: '1719391629749'
            value: Product Image
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 850c6023-dfb0-4532-9da5-a19ec754d480
            value: Feeling Image
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          id: 'true'
          logical_operator: or
        - case_id: 8a4f1fc0-397b-424e-a744-5c4090f82295
          conditions:
          - comparison_operator: is
            id: df3688ee-ccf0-4726-9713-f39f9b537766
            value: Tech Logo
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          id: 8a4f1fc0-397b-424e-a744-5c4090f82295
          logical_operator: and
        - case_id: 2bd36163-37cb-407f-ad33-7c612bf2e791
          conditions:
          - comparison_operator: is
            id: 43fd393f-e959-4d2f-91a9-fc112623e595
            value: Manual
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 7bc631ee-305a-4a2a-942b-ecab970db436
            value: Energy Label
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: bfc78532-9f3d-4792-8580-3468d0b00f36
            value: Data Sheet
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 3cc1475b-5540-4c62-891d-27d2567ff878
            value: Fiche
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 9087f9cb-dc3b-4cda-974d-3f119626c8e8
            value: EU Tariff File
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 55a98494-6556-4d73-a319-2793a97df162
            value: Battery Certificate
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          - comparison_operator: is
            id: 5c379fe1-dd23-4dd2-afcf-3692b56d94e2
            value: Safety Data Sheet
            varType: string
            variable_selector:
            - '1714920866750'
            - filetype
          id: 2bd36163-37cb-407f-ad33-7c612bf2e791
          logical_operator: or
        conditions:
        - comparison_operator: is
          id: '1719391629749'
          value: productImage
          variable_selector:
          - '1714920866750'
          - filetype
        desc: ''
        logical_operator: and
        selected: false
        title: FileType condition
        type: if-else
      height: 404
      id: '1719391536361'
      position:
        x: -23.72524316829856
        y: 302.8844452803611
      positionAbsolute:
        x: -23.72524316829856
        y: 302.8844452803611
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        desc: ''
        outputs:
        - value_selector:
          - '1728398212858'
          - text
          variable: output
        selected: false
        title: Image Analysis Output
        type: end
      height: 90
      id: '1719391856189'
      position:
        x: 875.2644922920333
        y: 302.8844452803611
      positionAbsolute:
        x: 875.2644922920333
        y: 302.8844452803611
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        model:
          completion_params:
            frequency_penalty: 0.5
            json_schema: "{\n  \"name\": \"LogoAnalysisOutput\",\n  \"schema\": {\n\
              \    \"title\": \"LogoAnalysisOutput\",\n    \"type\": \"object\",\n\
              \    \"properties\": {\n      \"logo_type\": {\n        \"type\": \"\
              string\",\n        \"enum\": [\n          \"Specification\",\n     \
              \     \"Processor\",\n          \"Test winner\",\n          \"Test product\"\
              ,\n          \"Award\",\n          \"Warranty\",\n          \"Ecommerce\"\
              \n        ]\n      },\n      \"alt_texts\": {\n        \"type\": \"\
              object\",\n        \"properties\": {\n          \"English\": { \"type\"\
              : \"string\" },\n          \"Norwegian\": { \"type\": \"string\" },\n\
              \          \"Danish\": { \"type\": \"string\" },\n          \"Finnish\"\
              : { \"type\": \"string\" },\n          \"Swedish\": { \"type\": \"string\"\
              \ }\n        },\n        \"required\": [\"English\", \"Norwegian\",\
              \ \"Danish\", \"Finnish\", \"Swedish\"],\n        \"additionalProperties\"\
              : false\n      },\n      \"average_color_hex\": {\n        \"type\"\
              : \"string\",\n        \"pattern\": \"^#([A-Fa-f0-9]{6})$\"\n      }\n\
              \    },\n    \"required\": [\"logo_type\", \"alt_texts\", \"average_color_hex\"\
              ],\n    \"additionalProperties\": false\n  }\n}"
            presence_penalty: 0.5
            response_format: json_schema
            temperature: 0
            top_p: 0.75
          mode: chat
          name: gpt-4o-mini
          provider: openai
        prompt_template:
        - id: f412d8ea-6ae8-4ba9-aebb-8932efb4c59d
          role: system
          text: You are an AI system specializing in analyzing e-commerce logo images
            and generating SEO-optimized alt text descriptions in multiple languages.
            Your task is to examine a logo, determine its type, create multilingual
            alt text descriptions, and estimate the image's average color.
        - edition_type: basic
          id: 30d33c11-4583-417a-a28c-796af54626b1
          role: user
          text: 'Attached is the logo you need to analyze:


            Please follow these steps:


            1. Analyze the logo thoroughly, paying attention to visual elements, colors,
            and any text present.


            4. Determine the logo type based on the following categories:

            - Specification: Features technical aspects of the product (e.g., 5G,
            ink-saving feature)

            - Processor: Intel or AMD processor logo

            - Test winner: Indicates the product is best or best in test

            - Test product: Shows product ranking in testing or recommendation by
            a testing site

            - Award: Highlights a design or sustainability award

            - Warranty: Emphasizes the product''s warranty period

            - Ecommerce: Relates to money-back guarantee, cashback, or return policy


            5. Generate a JSON object with the determined logo type. For example:

            {"logo_type": "Specification"}


            6. Create SEO-optimized alt text descriptions for the logo in 5 languages:
            English, Norwegian, Danish, Finnish, and Swedish. Follow these guidelines:

            a) Accurately describe the product or feature represented by the logo.

            b) Include relevant keywords naturally, without forcing them.

            c) Keep the text concise, aiming for about 125 characters or less.

            d) Avoid keyword stuffing.

            e) Provide context when relevant, especially if the logo shows the product
            in use.

            f) Skip phrases like "Image of" or "Picture of".

            g) Use proper spelling and grammar.

            h) Focus on accessibility and providing an equivalent experience for all
            users.

            i) Be specific and detailed when necessary to differentiate between similar
            items.

            j) Include any text that is part of the image, translating it if necessary.

            k) Front-load critical information to present key details first.


            7. Estimate the average color of the provided image and output the color
            as a HEX code.


            Before providing your final output, wrap your analysis inside <detailed_logo_analysis>
            tags. Include the following:

            1. Detailed description of the logo''s visual elements, colors, and text.

            2. Step-by-step reasoning for determining the logo type:

            - List each potential logo type.

            - Provide arguments for and against each type.

            - Conclude with the most appropriate type and explain why.

            3. Explanation of your SEO optimization strategy for each language:

            - Brainstorm relevant keywords for each language.

            - Explain how you''ll incorporate these keywords naturally.

            4. Description of the method used for color estimation:

            - Break down the process into steps.

            - Explain how you''ll handle multiple colors if present.


            Your final output should be structured as follows:


            1. Logo type JSON object

            2. Alt text descriptions in the specified languages

            3. HEX code for the estimated average color


            Example output structure (note: this is a generic example to illustrate
            the format):


            {"logo_type": "[Type]"}


            English: [SEO-optimized alt text description]

            Norwegian: [SEO-optimized alt text description in Norwegian]

            Danish: [SEO-optimized alt text description in Danish]

            Finnish: [SEO-optimized alt text description in Finnish]

            Swedish: [SEO-optimized alt text description in Swedish]


            #[HEX color code]


            Please proceed with your analysis and provide the requested output.'
        selected: false
        title: Logo analysis
        type: llm
        variables: []
        vision:
          configs:
            detail: high
            variable_selector:
            - '1714920866750'
            - file
          enabled: true
      height: 98
      id: '1728398041688'
      position:
        x: 453.63749851366333
        y: 449.42070507111214
      positionAbsolute:
        x: 453.63749851366333
        y: 449.42070507111214
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        model:
          completion_params:
            frequency_penalty: 0.5
            json_schema: "{\n  \"name\": \"ImageAnalysisOutput\",\n  \"schema\": {\n\
              \    \"title\": \"ImageAnalysisOutput\",\n    \"type\": \"object\",\n\
              \    \"properties\": {\n      \"image_type\": {\n        \"type\": \"\
              string\",\n        \"enum\": [\n          \"Packshot Image\",\n    \
              \      \"Lifestyle Image\",\n          \"Detail Image\",\n         \
              \ \"Infographic Image\",\n          \"Comparison Image\",\n        \
              \  \"Set Image\"\n        ]\n      },\n      \"alt_text\": {\n     \
              \   \"type\": \"object\",\n        \"properties\": {\n          \"English\"\
              : { \"type\": \"string\" },\n          \"norwegian\": { \"type\": \"\
              string\" },\n          \"danish\": { \"type\": \"string\" },\n     \
              \     \"finnish\": { \"type\": \"string\" },\n          \"swedish\"\
              : { \"type\": \"string\" }\n        },\n        \"required\": [\"English\"\
              , \"norwegian\", \"danish\", \"finnish\", \"swedish\"],\n        \"\
              additionalProperties\": false\n      },\n      \"average_color\": {\n\
              \        \"type\": \"string\",\n        \"pattern\": \"^#([A-Fa-f0-9]{6})$\"\
              \n      }\n    },\n    \"required\": [\"image_type\", \"alt_text\",\
              \ \"average_color\"],\n    \"additionalProperties\": false\n  }\n}"
            presence_penalty: 0.5
            response_format: json_schema
            temperature: 0
            top_p: 0.75
          mode: chat
          name: gpt-4o-mini
          provider: openai
        prompt_template:
        - id: 6266ce57-ebea-4b84-b832-ab4a56e547a3
          role: system
          text: You are an AI system specializing in analyzing e-commerce product
            images and generating SEO-optimized alt text descriptions in multiple
            languages. Your task is to examine a product image, determine its type,
            create multilingual alt text descriptions, and estimate the image's average
            color.
        - id: 56c5640e-560f-4fbd-a6a1-96ad5a99c5e3
          role: user
          text: 'Attached is the image you''ll be analyzing:


            Now, here are the product names in different languages that you''ll be
            working with:


            <product_name_english>

            {{#1714920866750.en_nameWeb#}}

            </product_name_english>


            <product_name_norwegian>

            {{#1714920866750.no_nameWeb#}}

            </product_name_norwegian>


            <product_name_danish>

            {{#1714920866750.dk_nameWeb#}}

            </product_name_danish>


            <product_name_finnish>

            {{#1714920866750.fi_nameWeb#}}

            </product_name_finnish>


            <product_name_swedish>

            {{#1714920866750.se_nameWeb#}}

            </product_name_swedish>


            Please follow these steps to complete your analysis:


            1. Image Analysis:

            Examine the provided image URL, focusing only on visible elements, colors,
            and any text present in the image.


            2. Image Type Determination:

            Categorize the image into one of the following types:

            - Packshot Image: Simple, clean images of the product against a plain
            background (usually white).

            - Lifestyle Image: Shows the product being used in a real-life context
            or environment.

            - Detail Image: Highlights specific features, textures, or intricate details
            of the product.

            - Infographic Image: Combines images with text or graphics to highlight
            key features, benefits, or instructions.

            - Comparison Image: Shows the product alongside similar items or alternatives.

            - Set Image: Displays the product as part of a set or collection.


            3. SEO-optimized Alt Text Creation:

            Create alt text descriptions for the image in 5 languages: English, Norwegian,
            Danish, Finnish, and Swedish. Use the provided product names in each respective
            language. Follow these guidelines:

            - Include the product name naturally within the alt text.

            - Describe only specific visible features (color, texture, material, etc.)
            actually present in the image.

            - Use relevant keywords that potential customers might use to find this
            product.

            - Focus solely on clearly visible elements in the image, not general product
            features.

            - Include and translate any text that is part of the image, reproducing
            it verbatim.

            - Identify and describe the most impactful visible element first.

            - Convey the overall tone and mood of the image without using superlatives.

            - Use vivid, descriptive language, but avoid exaggeration or speculation.

            - Present information in a logical order.

            - Use appropriate language and punctuation to convey factual content.

            - Omit irrelevant details and any information not directly visible in
            the image.

            - Aim for a balance between comprehensive description and conciseness.

            - Describe the content directly without using phrases like "the picture
            shows" or "the image contains."


            4. Color Estimation:

            Estimate the average color of the provided image and output the color
            as a HEX code.


            Before providing your final output, wrap your analysis inside <image_analysis>
            tags. Include:

            - A numbered list of all visible elements in the image, focusing on specific
            details.

            - Your reasoning for choosing the final image type classification.

            - A list of potential SEO keywords for each language, considering only
            the visible features of the product.

            - Your strategy for adapting keywords and alt text for cultural differences
            in each language.

            - The method used for color estimation, including how you determined the
            dominant colors.

            - A detailed description of exactly what is visible in the image, focusing
            on specific elements rather than general product features.

            - A final review to ensure no speculative or non-visible details have
            been included in the alt text.


            Your final output should be a JSON object structured as follows:


            ```json

            {

            "image_type": "string",

            "alt_text": {

            "english": "string",

            "norwegian": "string",

            "danish": "string",

            "finnish": "string",

            "swedish": "string"

            },

            "average_color": "string"

            }

            ```


            Ensure that all alt text descriptions are properly escaped within the
            JSON structure and focus solely on visible elements in the image.


            Please proceed with your analysis and provide the requested output.'
        selected: false
        title: Image analysis output
        type: llm
        variables: []
        vision:
          configs:
            detail: high
            variable_selector:
            - '1714920866750'
            - file
          enabled: true
      height: 98
      id: '1728398212858'
      position:
        x: 453.63749851366333
        y: 302.8844452803611
      positionAbsolute:
        x: 453.63749851366333
        y: 302.8844452803611
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        desc: ''
        outputs:
        - value_selector: []
          variable: ''
        selected: false
        title: error
        type: end
      height: 54
      id: '1730886553319'
      position:
        x: 453.63749851366333
        y: 784.4741880970141
      positionAbsolute:
        x: 453.63749851366333
        y: 784.4741880970141
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        context:
          enabled: false
          variable_selector: []
        desc: ''
        model:
          completion_params:
            frequency_penalty: 0.5
            json_schema: "{\n  \"name\": \"DocumentAnalysisOutput\",\n  \"schema\"\
              : {\n    \"title\": \"DocumentAnalysisOutput\",\n    \"type\": \"object\"\
              ,\n    \"properties\": {\n      \"DocumentContentsMarkdown\": {\n  \
              \      \"type\": \"string\"\n      },\n      \"DocumentLanguages\":\
              \ {\n        \"type\": \"array\",\n        \"items\": {\n          \"\
              type\": \"string\"\n        }\n      }\n    },\n    \"required\": [\"\
              DocumentContentsMarkdown\", \"DocumentLanguages\"],\n    \"additionalProperties\"\
              : false\n  }\n}"
            presence_penalty: 0.5
            response_format: json_schema
            temperature: 0
            top_p: 0.75
          mode: chat
          name: gpt-4o-mini
          provider: openai
        prompt_template:
        - id: 9512f95c-6542-4868-8afe-7bd09e1c2171
          role: system
          text: You are an advanced AI assistant specialized in document analysis
            and transcription. Your task is to accurately replicate the contents of
            a document page from an image file, while also providing context for visual
            elements.
        - id: edea1749-a0db-4ad0-927a-63b932dd6997
          role: user
          text: "Attached is the image file you need to analyze.\n\nYour goal is to\
            \ perform detailed Optical Character Recognition (OCR), correct any errors,\
            \ describe images inline, and present the information in a well-structured\
            \ markdown format.\n\nBefore providing your final output, please analyze\
            \ the document thoroughly. Wrap your analysis inside <document_examination>\
            \ tags and address the following points:\n\t1.\tList all elements identified\
            \ in the image (text blocks, images, tables, etc.), numbering each element\
            \ as you go.\n\t2.\tProvide a step-by-step breakdown of the document’s\
            \ visual layout, including the positioning of key elements.\n\t3.\tDescribe\
            \ your OCR process and any challenges encountered.\n\t4.\tOutline your\
            \ plan for recreating the document in markdown.\n\t5.\tExplain your strategy\
            \ for creating comprehensive yet concise image descriptions.\n\t6.\tIdentify\
            \ potential OCR challenges specific to this document and how you plan\
            \ to address them. For each challenge, write down a potential solution.\n\
            \t7.\tDiscuss how you will ensure that the language and terminology used\
            \ in your output closely matches that of the original document. Provide\
            \ examples if possible.\n\t8.\tPlan out the overall markdown structure\
            \ for the document, including headers, subheaders, and other formatting\
            \ elements.\n\t9.\tList any specific areas where you anticipate difficulties\
            \ in transcription or formatting, and how you plan to overcome them.\n\
            \t10.\tIdentify the primary language(s) of the document. If multiple languages\
            \ are present, estimate the percentage of content in each language.\n\t\
            11.\tCreate a concise checklist of the steps you’ll follow to ensure accuracy\
            \ and completeness in your transcription.\n\nAfter completing your analysis,\
            \ follow these steps to create your output:\n\t1.\tPerform OCR on all\
            \ text elements, including those within images or graphics.\n\t2.\tCorrect\
            \ any OCR errors you detect.\n\t3.\tFor each image, graphic, or figure\
            \ in the document:\n\t•\tProvide a detailed description of its contents\
            \ inline with the text.\n\t•\tUse markdown image syntax with elaborated\
            \ alt text.\n\t•\tEnsure the description is concise, but comprehensive\
            \ enough for accessibility.\n\t•\tTranscribe any text within graphics\
            \ accurately.\n\t4.\tRecreate tables or other graphics using simple markdown\
            \ syntax, ensuring that table elements remain readable and well-structured.\n\
            \t5.\tUse markdown formatting to maintain the document’s structure and\
            \ hierarchy, with a focus on simplicity to avoid complex characters or\
            \ unsupported markdown syntax.\n\t6.\tAvoid overly large text blocks;\
            \ summarize repetitive sections if necessary.\n\t7.\tOrganize all information\
            \ into a well-structured markdown document that closely resembles the\
            \ original layout.\n\t8.\tDo not include footers or page numbers in your\
            \ output.\n\t9.\tDo not use emojis in your output.\n\t10.\tEnsure that\
            \ language and terminology closely match the original document for consistency.\n\
            \nYour final output must be in the following JSON format:\n\n{\n  \"DocumentAnalysisOutput\"\
            : {\n    \"DocumentContentsMarkdown\": \"The full markdown representation\
            \ of the document page, including all text, inline image descriptions,\
            \ and recreated graphics or tables.\",\n    \"DocumentLanguages\": [\n\
            \      \"Detected languages in the document, listed in order of prominence.\"\
            \n    ]\n  }\n}\n\nBegin your document examination now, followed by the\
            \ document transcription as instructed."
        selected: true
        title: Document analysis
        type: llm
        variables: []
        vision:
          configs:
            detail: high
            variable_selector:
            - '1714920866750'
            - file
          enabled: true
      height: 98
      id: '1731321244320'
      position:
        x: 453.63749851366333
        y: 591.5672725566749
      positionAbsolute:
        x: 453.63749851366333
        y: 591.5672725566749
      selected: false
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    - data:
        desc: ''
        outputs:
        - value_selector:
          - '1731321244320'
          - text
          variable: output
        selected: false
        title: Document Analysis output
        type: end
      height: 90
      id: '1731321721145'
      position:
        x: 875.2644922920333
        y: 591.5672725566749
      positionAbsolute:
        x: 875.2644922920333
        y: 591.5672725566749
      selected: true
      sourcePosition: right
      targetPosition: left
      type: custom
      width: 244
    viewport:
      x: 749.4435638438554
      y: 177.77511584173362
      zoom: 0.7764764961366506

✔️ Expected Behavior

The workflow behaves the same way as single-run through the web interface and processes the image file

❌ Actual Behavior

API throws error: File type not allowed

Batch workflow web interface throws error: "file in input form must be a file"

Nov 11 '24 14:11 sigvardt

I found some relevant information that might help with your issue:

File Type Not Allowed Error: This error can occur if the file type is not among the allowed types specified in your workflow configuration. Ensure that the file extensions and types you are trying to upload are included in the allowed_file_extensions and allowed_file_types settings in your workflow configuration. The Dify API validates file types against a list of supported formats, including images like JPG, JPEG, PNG, GIF, WEBP, and SVG ^[1].
File in Input Form Must Be a File Error: This error is triggered when the input for a file variable is not a dictionary or an instance of the File class. Ensure that the input provided for the file variable in your batch workflow or API request is correctly formatted as a dictionary or File object ^[2].

Additionally, there have been recent changes in the Dify codebase related to file upload handling. The image upload feature has been integrated into a more comprehensive file upload functionality, which might affect how file uploads are processed in version 0.11.1 ^[3].

If these suggestions do not resolve the issue, you might want to check for any recent updates or changes in the Dify codebase that could affect file upload handling.

To continue talking to Dosu, mention @dosu.

Nov 11 '24 14:11 dosubot[bot]

I've checked both 1 & 2.

For reference I've tested with the testfiles below:

filetype,file,en_nameWeb,no_nameWeb,se_nameWeb,fi_nameWeb,dk_nameWeb Product Image,https://media.power-cdn.net/images/h-aiisawesome/products/3306919/3306919_1_1200x1200_w_g.jpg,"IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE"

And the change suggested by Dosu in 2:

filetype,file,en_nameWeb,no_nameWeb,se_nameWeb,fi_nameWeb,dk_nameWeb Product Image,"{""file_path"":""https://media.power-cdn.net/images/h-aiisawesome/products/3306919/3306919_1_1200x1200_w_g.jpg"",""file_type"":""image/jpeg"",""file_name"":""3306919_1_1200x1200_w_g.jpg"",""file_extension"":""jpg"",""upload_method"":""url""}","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE","IPAD AIR 11"" CL 2024 128GB BLUE"

Nov 11 '24 14:11 sigvardt

Issue is still present in managed version of 0.12.1

Nov 26 '24 08:11 sigvardt

Hi, @sigvardt. I'm Dosu, and I'm helping the Dify team manage their backlog and am marking this issue as stale.

Issue Summary:

Reported bug in Dify version 0.11.1 with workflow batch and API job requests failing when a URL to a file is included.
Single jobs work correctly, indicating a specific issue with batch/API requests.
I suggested checking file type allowances and formatting, but you confirmed these were not the issue.
The problem persists in version 0.12.1, suggesting a deeper issue with file upload handling.

Next Steps:

Please confirm if this issue is still relevant to the latest version of the Dify repository by commenting here.
If there is no further activity, this issue will be automatically closed in 15 days.

Thank you for your understanding and contribution!

Dec 27 '24 16:12 dosubot[bot]