middleman-sitemap icon indicating copy to clipboard operation
middleman-sitemap copied to clipboard

Add the possibility to exclude a page from being included

Open andreamoro opened this issue 10 years ago • 18 comments

Not all pages should be included in the sitemap. E.g. the thank-you page For this reason it would be great having a tag in the YAML that would be recognised as way to avoid such inclusion.

andreamoro avatar Feb 08 '15 21:02 andreamoro

Yes, it would be great to have the option to exclude pages!

gitviola avatar Feb 09 '15 15:02 gitviola

Here's what that could look like: https://github.com/statonjr/middleman-sitemap/pull/5

jeremysmithco avatar Feb 24 '15 20:02 jeremysmithco

It's should probably be something like Sitemap-ignore: true

Just to about any confusion with additional plugin, but also to make it 100% clear to the person using MM.

andreamoro avatar Feb 25 '15 05:02 andreamoro

I agree with @andreamoro!

Also it would be great to exclude entire directories in the config.rb. Currently I am using my own helper for that:

def in_sitemap?(page)
  page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
end

gitviola avatar Feb 25 '15 09:02 gitviola

@schurig have you already done some implementation to work on top of the plugin?

It would be great if you can share the whole bunch of code as I do need it for a project of mine, but I am struggling in time at present.

andreamoro avatar Feb 25 '15 09:02 andreamoro

Unfortunately not. I'm not using any plugin at the moment. The reason is this issue here - I really need to exclude pages and directories. But I can share the entire code that I'm using at the moment:

# sitemap.xml.builder

xml.instruct!
xml.urlset 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9' do
  sitemap.resources.select { |page| in_sitemap?(page) }.each do |page|
    xml.url do
      xml.loc site_url + page.url
      xml.loc page.path
      xml.lastmod Date.today.to_time.iso8601
      xml.changefreq page.data.changefreq || 'monthly'
      xml.priority page.data.priority || '0.9'
    end
  end
end
# config.rb

require 'builder'

helpers do
  def in_sitemap?(page)
    page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
  end
end
# Gemfile

gem 'builder'

Hope that helps! :)

gitviola avatar Feb 25 '15 10:02 gitviola

@schurig thanks for the code. I believe your solution does what it says out of the box and really doesn't require the use of the plugin. Unless I'm not missing something?

andreamoro avatar Feb 25 '15 10:02 andreamoro

@andreamoro almost! It unfortunately doesn't generate a sitemap.xml.gz file.

gitviola avatar Feb 25 '15 10:02 gitviola

@andreamoro I was concerned about the frontmatter options colliding as well. Actually, I think it would be best to just namespace them all, like this:

---
sitemap:
  changefreq: weekly
  priority: 0.3
  ignore: true
---

That way, all options are accessible from the sitemap. namespace.

Since this would be a breaking change, it would probably be best to release it with a new major version, so people who are updating minor/patch versions don't get hosed when all their frontmatter options suddenly stop working.

jeremysmithco avatar Feb 25 '15 13:02 jeremysmithco

@bentoncreation make absolutely sense, and it allows options for expanding the project. E.g. assuming you want to include an image in the sitemap, by adding something like the following bits it can be easily parsed and appended in the page.

sitemap:
   images:
      img:
        loc: http://www..... 
        caption: bla bla
        title: this is the title of image 1
      img:
        loc: http://www..... 
        caption: bla bla
        title: this is the title of image 2

andreamoro avatar Feb 25 '15 13:02 andreamoro

@andreamoro Yeah, totally!

jeremysmithco avatar Feb 25 '15 13:02 jeremysmithco

So we have to wait for @stantonjr to code this bit :)

andreamoro avatar Feb 25 '15 13:02 andreamoro

@schurig I was thinking about how you might ignore whole directories and I'm wondering if this makes sense. In your config, have an ignored_paths option, like so:

activate :sitemap do |sitemap|
  sitemap.hostname = "http://www.mysite.com"
  sitemap.ignored_paths = %W(
    /private
    /stuff
  )
end

And then, when getting pages (in my proposed private get_pages method), filter out those that match anything found in ignored_paths.

jeremysmithco avatar Feb 25 '15 13:02 jeremysmithco

@bentoncreation sounds good! But what about single pages? I think there are situations where you want to exclude sites without writing

sitemap:
  ignore: true

into them.

activate :sitemap do |sitemap|
  sitemap.hostname = "http://www.mysite.com"
  sitemap.ignore = %r{^/api/contact_form.php*}
end

gitviola avatar Feb 25 '15 15:02 gitviola

@schurig What kind of situations are you thinking of? I think your .php file example would already be excluded because the sitemap builder is only looking at .html files.

jeremysmithco avatar Feb 25 '15 19:02 jeremysmithco

@bentoncreation oh, you're right! However, I think it would still be good to let the user decide whether he wants to go into his config.rb or in each of the files to see and manage his ignores. But for now we will be good with writing it in the file I think.

gitviola avatar Feb 25 '15 19:02 gitviola

@schurig Yeah, I could see that. I wouldn't normally think it was a good the idea to have multiple ways to set the same option, but maybe it's not a big deal in this case. If I get my other pull request accepted I may look at adding this concept as well.

jeremysmithco avatar Feb 26 '15 04:02 jeremysmithco

I believe there should not be a method to remove page that is clashing with another. But that's my idea.

andreamoro avatar Feb 26 '15 09:02 andreamoro