middleman-sitemap
middleman-sitemap copied to clipboard
Add the possibility to exclude a page from being included
Not all pages should be included in the sitemap. E.g. the thank-you page For this reason it would be great having a tag in the YAML that would be recognised as way to avoid such inclusion.
Yes, it would be great to have the option to exclude pages!
Here's what that could look like: https://github.com/statonjr/middleman-sitemap/pull/5
It's should probably be something like
Sitemap-ignore: true
Just to about any confusion with additional plugin, but also to make it 100% clear to the person using MM.
I agree with @andreamoro!
Also it would be great to exclude entire directories in the config.rb
. Currently I am using my own helper for that:
def in_sitemap?(page)
page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
end
@schurig have you already done some implementation to work on top of the plugin?
It would be great if you can share the whole bunch of code as I do need it for a project of mine, but I am struggling in time at present.
Unfortunately not. I'm not using any plugin at the moment. The reason is this issue here - I really need to exclude pages and directories. But I can share the entire code that I'm using at the moment:
# sitemap.xml.builder
xml.instruct!
xml.urlset 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9' do
sitemap.resources.select { |page| in_sitemap?(page) }.each do |page|
xml.url do
xml.loc site_url + page.url
xml.loc page.path
xml.lastmod Date.today.to_time.iso8601
xml.changefreq page.data.changefreq || 'monthly'
xml.priority page.data.priority || '0.9'
end
end
end
# config.rb
require 'builder'
helpers do
def in_sitemap?(page)
page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
end
end
# Gemfile
gem 'builder'
Hope that helps! :)
@schurig thanks for the code. I believe your solution does what it says out of the box and really doesn't require the use of the plugin. Unless I'm not missing something?
@andreamoro almost! It unfortunately doesn't generate a sitemap.xml.gz file.
@andreamoro I was concerned about the frontmatter options colliding as well. Actually, I think it would be best to just namespace them all, like this:
---
sitemap:
changefreq: weekly
priority: 0.3
ignore: true
---
That way, all options are accessible from the sitemap. namespace.
Since this would be a breaking change, it would probably be best to release it with a new major version, so people who are updating minor/patch versions don't get hosed when all their frontmatter options suddenly stop working.
@bentoncreation make absolutely sense, and it allows options for expanding the project. E.g. assuming you want to include an image in the sitemap, by adding something like the following bits it can be easily parsed and appended in the page.
sitemap:
images:
img:
loc: http://www.....
caption: bla bla
title: this is the title of image 1
img:
loc: http://www.....
caption: bla bla
title: this is the title of image 2
@andreamoro Yeah, totally!
So we have to wait for @stantonjr to code this bit :)
@schurig I was thinking about how you might ignore whole directories and I'm wondering if this makes sense. In your config, have an ignored_paths option, like so:
activate :sitemap do |sitemap|
sitemap.hostname = "http://www.mysite.com"
sitemap.ignored_paths = %W(
/private
/stuff
)
end
And then, when getting pages (in my proposed private get_pages method), filter out those that match anything found in ignored_paths.
@bentoncreation sounds good! But what about single pages? I think there are situations where you want to exclude sites without writing
sitemap:
ignore: true
into them.
activate :sitemap do |sitemap|
sitemap.hostname = "http://www.mysite.com"
sitemap.ignore = %r{^/api/contact_form.php*}
end
@schurig What kind of situations are you thinking of? I think your .php file example would already be excluded because the sitemap builder is only looking at .html files.
@bentoncreation oh, you're right! However, I think it would still be good to let the user decide whether he wants to go into his config.rb
or in each of the files to see and manage his ignores. But for now we will be good with writing it in the file I think.
@schurig Yeah, I could see that. I wouldn't normally think it was a good the idea to have multiple ways to set the same option, but maybe it's not a big deal in this case. If I get my other pull request accepted I may look at adding this concept as well.
I believe there should not be a method to remove page that is clashing with another. But that's my idea.