sitemap_generator icon indicating copy to clipboard operation
sitemap_generator copied to clipboard

Multi-lingual site sitemap generation

Open dmitry opened this issue 14 years ago • 8 comments

Hi,

I have multi-lingual site, where languages is separated by a first level (TLD) domain name, eg www.sitename.com, www.sitename.es.

Right now there are no way to generate sitemaps for different domain names (hosts). When it will be possible to do that? Any suggestions on that?

PS. Actually I need only possibility to generate sitemap files from rake tasks, everything else I can do through the nginx config (point different sitemaps on the different hostnames).

Thanks, Dmitry

dmitry avatar Apr 02 '10 09:04 dmitry

Hi Dmitry,

It sounds like this could be useful. But let me see if I understand correctly what you want. Because obviously you can set the default host to whatever you want in your sitemap config. But you can also set the host when you add a link.

So couldn't you just do something like:

sitemap.add artists_path, :host => 'www.sitename.es' sitemap.add artists_path, :host => 'www.sitename.com'

I assume that all these domains are being served by a single Rails instance, otherwise you would just have different sitemap configs for each Rails instance.

The effect would be that each site's sitemap contains links for all language versions, but that doesn't seem like a bad thing.

How do you detect which TLD you're operating under?

Cheers, Karl

kjvarga avatar Apr 02 '10 18:04 kjvarga

Hi Karl,

If I will add the :host option to the add method, all those links will be in one file, but as far as I know one host per one sitemap file is more friendly for the search engines.

http://sitemaps.org/protocol.php under "Sitemaps & Cross Submits" header.

Yes, all the domains are served by a single Rails instance.

On a different TLD's I need to change only the language, or remove/add blocks, but everything depending on the language. So I'm doing something like:

HOSTS = {
  'www.hostname.ru' => :ru,
  'www.hostname.com' => :en,
  'www.hostname.es' => :es,
  'www.hostname.de' => :de
}

def set_locale
  locale = HOSTS[request.host]
  I18n.locale = (locale || :ru) # second is only for the development or not initialized domain
end

I'm thinking about the generating the sitemaps to the /sitemaps/en/, /sitemaps/ru/ paths, and then using the nginx config or rails middleware + X-Accel-Redirect rewrite paths depending on the domain to a correct path.

How do you think?

Thanks, Dmitry

dmitry avatar Apr 03 '10 14:04 dmitry

Ok now I understand what you are looking for. And thanks for the link.

I do have a patch to put sitemaps into a subdirectory, so that could come in useful here (though in normal circumstances it's a bad idea to do so, as I now know :).

As for directing the robot to the correct sitemaps, what about some Rack middleware to perform the rewrite. rack-rewrite supports regular expression-based rules for serving a file, so it would be easy to serve for example public/ru/sitemap_index.xml.gz if the robot requests www.hostname.rb/sitemap_index.xml.gz.

This would be a far simpler setup.

Cheers, Karl

kjvarga avatar Apr 07 '10 22:04 kjvarga

I would implement some mechanism to define the desired rewrite rules, so the user would not have to do so themselves.

The Rack middleware would be an optional include should the user need this kind of setup.

Configuration could be as simple as:

key is string or regular expression, value is symbol

SitemapGenerator.hosts = { 'hostname.ru' => :ru, 'hostname.com' => :en, 'hostname.es' => :es, 'hostname.de' => :de }

SitemapGenerator.add_links(:hosts => [:ru, :de]) do |sitemap| # links only for ru/de TLDs end

SitemapGenerator.add_links do |sitemap| # links for all TLDs end

Sitemap files would be written to the directory given by value e.g. public/ru/sitemap*.xml.gz

Rack middleware "rewrites" requests for sitemap files based on the matching host in SitemapGenerator.hosts hash. Matching is done against the REQUEST_URI.

In the more common case of a single host this would also support putting sitemaps into a subdirectory and serving them transparently from that subdirectory. The config in this case could be something like:

SitemapGenerator.hosts = { 'hostname.com' => :sitemaps, }

kjvarga avatar Apr 07 '10 22:04 kjvarga

Hi Karl,

Yes. I agree, that will be superb!

You can use http://github.com/jtrupiano/rack-rewrite instead of writing your own middleware (using send_file argument, x_send_file is only supported by apache server, maybe I should write a patch for this middleware to support nginx server... so it's possible then to use x-send-file directive?)

Thanks, Dmitry

dmitry avatar Apr 10 '10 13:04 dmitry

Hi Dmitry,

With the latest version 0.3.2. You should be able to do what you want now. Your sitemap config would consist of SitemapGenerator::LinkSet.new(...) calls. I haven't tested it out myself yet...I'm working on an easier API atm...maybe wait for that :)

Cheers, Karl

kjvarga avatar May 25 '10 19:05 kjvarga

Hi Karl,

Is it possible to use 0.3.2 with rails 2.3?

Thanks, Dmitry

dmitry avatar May 26 '10 11:05 dmitry

Yes

kjvarga avatar May 26 '10 17:05 kjvarga