morphic icon indicating copy to clipboard operation
morphic copied to clipboard

Enable deep searching of specific URLs

Open miurla opened this issue 9 months ago • 12 comments

When a specific URL is included in the query, retrieve the data from that URL instead of searching.

miurla avatar Apr 28 '24 00:04 miurla

Try this: https://github.com/mendableai/firecrawl

miurla avatar Apr 29 '24 14:04 miurla

Firecrawl is pretty slow when I tested it Is this fine? I'm sure the user would understand as long as we provide a spinner

albertdbio avatar Apr 30 '24 20:04 albertdbio

Ahh nevermind, the crawl api which scrapes multiple pages is slow, just scraping one page is decent

albertdbio avatar Apr 30 '24 20:04 albertdbio

It's a good try! I got the following advice on our Discord. I'm going to try this.

You don't need firecrawl for this, IMHO, can just use Playwright/fetch and then use turndown to convert html to markdown

miurla avatar Apr 30 '24 20:04 miurla

Firecrawl uses playwright + turndown 😆 We can also use this one https://jina.ai/reader

I'll give it a shot, already got the researcher able to refer to the previous sources it pulled, just that the types are a bit tricky. Just need to integrate those API's

I'll work on this one first and I'll see if I can get back to video section next week, it's been busy at work

albertdbio avatar Apr 30 '24 21:04 albertdbio

jina seems to me to be a tad bit faster btw, if you wanna try it out

albertdbio avatar Apr 30 '24 21:04 albertdbio

Thank you. I was stuck with the chat history feature. I'll be able to start working on a different task soon.

miurla avatar Apr 30 '24 21:04 miurla

Hey @miurla, fixed the types related to enabling follow up conversations and also found a way to enable function calling with groq so now I'm moving forward with adding FireCrawl, planning to push a PR by Sunday!

albertdbio avatar May 02 '24 16:05 albertdbio

@albertdbio Wow, that's great. Is the function calling an implementation that uses groq's sdk to replace the researcher's model? Looking forward to the PR.

miurla avatar May 02 '24 21:05 miurla

@albertdbio I've merged significant changes, so it's best to develop from main.

miurla avatar May 03 '24 12:05 miurla

@miurla not using the groq sdk, it turns out the vercel ai sdk needed to be upgraded to a minor version. That minor version set's content as optional in the OpenAI zod schema. Groq doesn't return content when doing function calls so it would throw an invalid response error. Going to test it out with the inquire agent to verify that it's working. Sounds good, I'll use main!

albertdbio avatar May 03 '24 15:05 albertdbio

Got caught up with work over the weekend. Will continue chipping away at it this week. First I'll rebase on to main.

albertdbio avatar May 06 '24 17:05 albertdbio

I'm working on this feature.

miurla avatar May 18 '24 00:05 miurla