Elias Dabbas comments

Results 22 comments of


                                            Elias Dabbas

about this result in serp

As far as I know, I don't think so. Here are the full available parameters: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list

about this result in serp

That's a great idea. You can construct the URL that you want, and crawl it with the `crawl` function, and extract the information you want.

Crawl depth is inconsistent with links

Thanks @antoineeripret Yes, this might be misleading unfortunately. The thing is that this is computed based on when the crawler discovered the link. This is probably caused by the default...

Crawl depth is inconsistent with links

@antoineeripret Yes, you can, with some network theory ```python import advertools as adv import pandas as pd import networkx as nx crawldf = pd.read_json("/Users/me/Desktop/temp/ipr.jsonl", lines=True) linkdf = adv.crawlytics.links(crawldf) linkdf.head() ```...

Make a screenshot ?

Thanks a lot @caroheymes Great to know you found it useful, and please share any examples you have. It would be great for me to understand how it is being...

invalid escape codes in doc comments

Thanks a lot @stevenh Yes, you're right. There is a bunch of rst strings that need escaping. I've created a fix, and should deploy a patch version soon. Will keep...

Feature request for crawl_images function

Thanks @caroheymesitf For now you can select using a regex for matching img src URL. This is not exactly XPath but can help in many cases. What do you want...

Feature request for crawl_images function

JS: This is going to depend on how each website does it, and will need a deeper look, because I don't think there is one way that can easily be...

Feature request for crawl_images function

@caroheymesitf Thanks for the code! I think this: ```python if self.xpath is not None: img_src = response.xpath(self.xpath).getall() else: img_src = response.xpath("//img/@src").getall() ``` should be modified to ```python response.xpath(f'{USER_SUPPLIED_XPATH}//*//img/@src').getall() ``` Get...