crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Why some of the options in the documentation are not actually available?

Open zahra-teb opened this issue 1 year ago • 7 comments

Hi, amazing crawl4ai team!

Many of the options you mentioned in the documentation are not available. Like excluded_tags, exclude_external_links,... . Why?!

zahra-teb avatar Dec 09 '24 08:12 zahra-teb

need contribution。my smart developer

lvzhengri avatar Dec 09 '24 09:12 lvzhengri

😅😂 I’m getting about 30 new issues every day now! As you know, this repository became the number one trending repo across all programming languages on GitHub, and honestly, I’m having a lot of fun. That said, I definitely need to get more help. New people are joining and supporting, and I’d love for you to join us too.

Maybe you could help with consistency between the documentation and reports. Maintaining documentation is tough since the library is so new and constantly evolving, it’s a bit challenging for me to keep up. But you’re right, it shouldn’t be like this, and I apologize for that. Contributions are definitely needed. It’s open source, and I’m committed to keeping it that way. Let’s see how far we can take it!

unclecode avatar Dec 09 '24 13:12 unclecode

@zahra-teb Btw these features are available, would you please share with me your code snippet so I can go through it and help you? Thx

unclecode avatar Dec 09 '24 13:12 unclecode

Hi again! I'm sure you guys can take it pretty well!

And this is the code snippet:

async with AsyncWebCrawler() as crawler: result = await crawler.arun( url=url, exclude_external_links=True )

The results still contain the links! Also, as far as I read your exciting code base, there were no params named exclude_external_links or excluded_tags. I am using version 0.4.0.

And also, I’d be happy and honored to help! Could you please let me know what kind of contributions you’re looking for?

Thank You!

zahra-teb avatar Dec 10 '24 11:12 zahra-teb

@zahra-teb Thanks for your kind words and interest in supporting Crawl4ai! Common contributions include finding and reporting bugs, submitting pull requests, and more. I suggest you to check the existing pull requests to understand how others contribute.

We also have a to-do list with modules that need work, contributors can tackle these, assist with documentation, or take on larger-scale tasks.

So, testing, bug reporting, debugging, adding new features, documentation, and etc.

Regarding exclude_external_links, I’ve shared the link to that part of the codebase. If you can’t find it, let me know, and I’ll help. Thanks again!

https://github.com/unclecode/crawl4ai/blob/ded554d3345ca00c038274fc38ff43b28b45cdd8/crawl4ai/content_scraping_strategy.py#L367

unclecode avatar Dec 10 '24 12:12 unclecode

Thank you so much. Yeah, I got it. Thanks for taking the time to respond.

Good luck!

zahra-teb avatar Dec 10 '24 20:12 zahra-teb

You are very welcome

unclecode avatar Dec 13 '24 10:12 unclecode