python-sitemap
                                
                                
                                
                                    python-sitemap copied to clipboard
                            
                            
                            
                        Mini website crawler to make sitemap from a website.
Could it be possible to restrict the search to a certain path? A bad example would be to restrict a search to `http://google.com/maps/` and ignore results which are in other...
If `http://domain/dir/page1.html` contains a link to `page2.html` the parser interprets this as `http://domain/page2.html`, correct is `http://domain/dir/page2.html`. Furthermore on a page containing references to the upper directories (`..`), these are changed...
I took a peak at your source code. One source for crawling issues is that you currently define in the code **not_parseable_ressources**. Instead, if you define parseable resources and limit...
Command: `python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.forum.2globalnomads.info --image --output sitemap.xml --verbose` Your fix > --drop "sid=[a-z0-9]{32}" Although it is not likely that people will run this script on phpbb3 forums as there...
Command: `python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.2globalnomads.info --image --output sitemap.xml --report` Missing page: > https://www.2globalnomads.info/suomi-fi/
Command: `python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.xetnet.fi --image --output sitemap.xml --verbose` Output: > INFO:root:Start the crawling process > INFO:root:Crawling #1: https://www.xetnet.fi > INFO:root:Crawling #2: https://www.xetnet.fi/category/ror/ > INFO:root:Crawling #3: https://www.xetnet.fi/wordpress-asennus-webhotelliin-2/ > INFO:root:Crawling #4:...
Adding videos to site would work the same way as images and if you make ALT/TITLE and FIGURECAPTION, the same code would work with \ as well. So far I...
Take from TITLE and/or ALT and from FIGCAPTION tags if they are present.
Dear Creator, Thank you very much for creating this. Is there a way to add hreflang tags automatically? Take care.
For a site that has a `robots.txt` like: ```robots.txt User-agent: * Disallow: / User-agent: Googlebot Allow: / Disallow: /login ``` it would be nice if there was an option to...