Elias Dabbas

Results 22 comments of Elias Dabbas

As far as I know, I don't think so. Here are the full available parameters: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list

That's a great idea. You can construct the URL that you want, and crawl it with the `crawl` function, and extract the information you want.

Assuming this was resolved.

Thanks @antoineeripret Yes, this might be misleading unfortunately. The thing is that this is computed based on when the crawler discovered the link. This is probably caused by the default...

@antoineeripret Yes, you can, with some network theory ```python import advertools as adv import pandas as pd import networkx as nx crawldf = pd.read_json("/Users/me/Desktop/temp/ipr.jsonl", lines=True) linkdf = adv.crawlytics.links(crawldf) linkdf.head() ```...

Thanks a lot @caroheymes Great to know you found it useful, and please share any examples you have. It would be great for me to understand how it is being...

Thanks a lot @stevenh Yes, you're right. There is a bunch of rst strings that need escaping. I've created a fix, and should deploy a patch version soon. Will keep...

Thanks @caroheymesitf For now you can select using a regex for matching img src URL. This is not exactly XPath but can help in many cases. What do you want...

JS: This is going to depend on how each website does it, and will need a deeper look, because I don't think there is one way that can easily be...

@caroheymesitf Thanks for the code! I think this: ```python if self.xpath is not None: img_src = response.xpath(self.xpath).getall() else: img_src = response.xpath("//img/@src").getall() ``` should be modified to ```python response.xpath(f'{USER_SUPPLIED_XPATH}//*//img/@src').getall() ``` Get...