dify
dify copied to clipboard
No workflow output if you use more then one webscraper
Self Checks
- [X] This is only for bug report, if you would like to ask a quesion, please head to Discussions.
- [X] I have searched for existing issues search for existing issues, including closed ones.
- [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [X] Pleas do not modify this template :) and fill in all the required fields.
Dify version
0.6.2
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
Underneath the workflow results on the "preview" page is an html section <div class="markdown-body"> </div>
If you make a workflow with more than one webscraper tool, the workflow progress spinner never turns to a green check and nothing is output into <div class="markdown-body"> </div>
even though the workflow completed
✔️ Expected Behavior
To be able to use webscraper more than once and still have output
❌ Actual Behavior
using web scraper more then once causes workflow to get stuck before final output
It appears the issue you're encountering with the Dify v0.6.2, specifically when using multiple instances of the webscraper tool in a single workflow, might not be directly related to a known bug or limitation in the handling of multiple tool instances. The core functionality for executing tools within workflows, managed by the ToolNode
class, is designed to handle multiple tool executions, including those of the webscraper tool, without inherently limiting the number or causing the workflow to hang.
To better diagnose and address the problem, it would be helpful to have more detailed information about your workflow configuration, the specific steps you're taking when setting up your workflow with multiple webscraper tools, and any error messages or logs that are generated during the execution. Additionally, providing a minimal reproducible example of the workflow that's causing the issue could greatly assist in identifying the root cause.
This approach will help in pinpointing whether the issue lies in the specific configuration or usage of the webscraper tool within your workflows, or if it's an unidentified issue within the Dify v0.6.2 codebase that needs further investigation.
Sources
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
It is directly related to the web scraper I believe because I switched out three web scraping tools for three custom web scraping tools and it worked perfectly fine. The nature of the original workflow was You give it a research topic or question, etc. Then it asks the llm to make three search queries to find relevant information, code. Is used to add each query to its own variable then use Google searches on each each one. Then ask the llm doing all the links and pick the three most relevant. Then uses the code again to add each link to a variable. Then it would web scrape each link. And then it would ask the LLM too basically answer the original question. Then it sends it to a custom tool that creates a PDF and sends back a link to it. And then it ends with the end block outputting the URL for the PDF. But nothing outputs unless I remove two web scraping blocks or I switch out all three for a custom web scraper
hello, can you share your DSL which could reproduce this issue? It works as normal in my case
I'm using custom tools, maybe it's the combination
Isn't it web scraper? do you mean using web scraper then custom tool?
The workflow was running a Google search, then using the code block to extract the URLs to variables, then web scrape three different ones. Then take that information and write something up as requested. And that would run my own custom tool that submits to an API that creates a PDF and returns a link to download the file. So I ended up doing to get around the issue of only be able to use the web scraping tool once. Was I just made another custom tool using web pilot's API for web scraping and use that to web scrape three times.
So I circled the three custom web pilot crawlers that I used. But they were originally the actual web crawler built in. But since I couldn't use more than one of them I had to switch to something else
So I circled the three custom web pilot crawlers that I used. But they were originally the actual web crawler built in. But since I couldn't use more than one of them I had to switch to something else
Is the "Create_PDF" a custom tool you have created? I was looking for some more formatting/output tools.
It is a Docker container that runs a fast API with a built-in pDF creation functions.It has two functionsIt can create PDF input from theAI. The AI can submit HTML and CSS and its formatted into a PDF and a download link is returned.The other function is it can be given a web address and it will convert that to a PDF.It stores the PDFs in a download folder and the job regularly delete s. It could be easily converted into a built-in tool https://github.com/vontainment/v-gpt-pdf-generator
I also have a few other fun ones. One uses a vector database for storage and has a function to create collections to add memories and to retrieve memories. So basically the AI can periodically insert important things that come up so it can remember them. I have another one that allowsThe AI to completely control a server.I have another one that allows complete control of email account.It can move emails delete emails replyReadso forth. Do I do need someone more python experience to maybe like tweak them a bit because I'm not like a python expert I just learned it for the AI stuff. I mean don't get me wrong with the fully functioning and so far no bugs. But I'm sure they could be optimized. Also I have found one great usage for the PDF one. I made an open API spec for the knowledge base API. So I can now have pDFs made and then put them into knowledge base or have a website converted to a PDF and putting the knowledge base
I also have a few other fun ones. One uses a vector database for storage and has a function to create collections to add memories and to retrieve memories. So basically the AI can periodically insert important things that come up so it can remember them. I have another one that allowsThe AI to completely control a server.I have another one that allows complete control of email account.It can move emails delete emails replyReadso forth. Do I do need someone more python experience to maybe like tweak them a bit because I'm not like a python expert I just learned it for the AI stuff. I mean don't get me wrong with the fully functioning and so far no bugs. But I'm sure they could be optimized. Also I have found one great usage for the PDF one. I made an open API spec for the knowledge base API. So I can now have pDFs made and then put them into knowledge base or have a website converted to a PDF and putting the knowledge base
Hit me up on Discord @ andremik and let's talk, maybe we could share some cool things
Hi, @vontainment,
I'm helping the Dify team manage their backlog and am marking this issue as stale. From what I understand, the issue you reported is related to using multiple instances of the webscraper in a single workflow, which resulted in the workflow progress spinner not turning to a green check and no output being generated. The issue seems to be related to the web scraper, and there was a discussion about using custom web scraping tools and Docker containers for PDF creation. The issue has been resolved by switching to custom web scraping tools, and there's an offer to connect on Discord to share ideas.
Could you please confirm if this issue is still relevant to the latest version of the Dify repository? If it is, please let the Dify team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to Dify! If you have any further questions or concerns, feel free to reach out.