Why some of the options in the documentation are not actually available?
Hi, amazing crawl4ai team!
Many of the options you mentioned in the documentation are not available. Like excluded_tags, exclude_external_links,... . Why?!
need contribution。my smart developer
😅😂 I’m getting about 30 new issues every day now! As you know, this repository became the number one trending repo across all programming languages on GitHub, and honestly, I’m having a lot of fun. That said, I definitely need to get more help. New people are joining and supporting, and I’d love for you to join us too.
Maybe you could help with consistency between the documentation and reports. Maintaining documentation is tough since the library is so new and constantly evolving, it’s a bit challenging for me to keep up. But you’re right, it shouldn’t be like this, and I apologize for that. Contributions are definitely needed. It’s open source, and I’m committed to keeping it that way. Let’s see how far we can take it!
@zahra-teb Btw these features are available, would you please share with me your code snippet so I can go through it and help you? Thx
Hi again! I'm sure you guys can take it pretty well!
And this is the code snippet:
async with AsyncWebCrawler() as crawler: result = await crawler.arun( url=url, exclude_external_links=True )
The results still contain the links! Also, as far as I read your exciting code base, there were no params named exclude_external_links or excluded_tags. I am using version 0.4.0.
And also, I’d be happy and honored to help! Could you please let me know what kind of contributions you’re looking for?
Thank You!
@zahra-teb Thanks for your kind words and interest in supporting Crawl4ai! Common contributions include finding and reporting bugs, submitting pull requests, and more. I suggest you to check the existing pull requests to understand how others contribute.
We also have a to-do list with modules that need work, contributors can tackle these, assist with documentation, or take on larger-scale tasks.
So, testing, bug reporting, debugging, adding new features, documentation, and etc.
Regarding exclude_external_links, I’ve shared the link to that part of the codebase. If you can’t find it, let me know, and I’ll help. Thanks again!
https://github.com/unclecode/crawl4ai/blob/ded554d3345ca00c038274fc38ff43b28b45cdd8/crawl4ai/content_scraping_strategy.py#L367
Thank you so much. Yeah, I got it. Thanks for taking the time to respond.
Good luck!
You are very welcome