statik icon indicating copy to clipboard operation
statik copied to clipboard

Split output file in chunks

Open tombreit opened this issue 6 years ago • 1 comments

In order to generate a valid google sitemap file, I need to split the file in chunks of a certain size (containing not more than 50.000 items):

Break up large sitemaps into a smaller sitemaps to prevent your server from being overloaded if Google requests your sitemap frequently. A sitemap file can't contain more than 50,000 URLs and must be no larger than 50 MB uncompressed. (Source: https://support.google.com/webmasters/answer/183668?hl=en#general-guidelines)

Currently my simple view, rendering an "all-in" sitemaps file, which holds more than 50.000 items:

path: /google_image_sitemap.xml
template: google_image_sitemap.xml.jinja2
context:
  dynamic:
    photos: session.query(Photo).all()

I'm not aware of an elegant, "statik-esque" way of generating these chunks and link them in a (small) sitemaps-index-file. Any idea?

tombreit avatar Jul 22 '18 20:07 tombreit

How often do those items change? If, once created, they're permanent, you could simply create sitemap files that group your items according to date (i.e. sort all 50,000 by date created), but do so in chunks (like chunks of 1,000). Then generate your sitemap index file to point to all of these files.

You could perhaps do this by creating a simple view for your sitemap index file, and then a complex view for your "chunk" index files. The key is making sure you sort in such a way that, when you create new items, they only get appended to the sorted list in your sitemap - not inserted somewhere in between.

thanethomson avatar Aug 05 '18 19:08 thanethomson