crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

How to use llm to parse news ?

Open GOOD-N-LCM opened this issue 11 months ago • 1 comments

Thank you for your project, it is very nice! But I tried to parse the news using Ollama, and the project could not meet my needs properly.

This my code:

import os import asyncio from crawl4ai import AsyncWebCrawler, CacheMode from crawl4ai.extraction_strategy import LLMExtractionStrategy from pydantic import BaseModel, Field

class KnowledgeGraph(BaseModel): title: str content: str

async def main(url): async with AsyncWebCrawler( verbose=True, user_agent_mode="random", user_agent_generator_config={ "device_type": "mobile", "os_type": "android" }, ) as crawler: result = await crawler.arun( url=url, cache_mode=CacheMode.BYPASS, remove_overlay_elements=True, word_count_threshold=1, extraction_strategy=LLMExtractionStrategy(

            provider="ollama/qwen2.5:14b",

            schema=KnowledgeGraph.schema(),
            extraction_type="schema",
            instruction="""Extract title and content from the given text.
            """
        ),            
        bypass_cache=True,
    )
    print(result.extracted_content)

if name == "main":

url = 'https://www.bloomberg.com/news/articles/2025-01-07/nvidia-ceo-unveils-more-powerful-graphics-cards-at-ces-event?srnd=homepage-asia'
asyncio.run(main(url))

GOOD-N-LCM avatar Jan 09 '25 05:01 GOOD-N-LCM

@GOOD-N-LCM, can you plz specify/elaborate whats the issue you are facing...

I may assist then...

devatbosch avatar Jan 09 '25 11:01 devatbosch

@GOOD-N-LCM Have yo checked documentation? Here https://docs.crawl4ai.com/extraction/llm-strategies/ Please follow the example then if still have any issue, please share here with more details

unclecode avatar Jan 13 '25 11:01 unclecode