<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://wiki.nginx.org/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://wiki.nginx.org/index.php?title=EmbeddedPerlSitemapsProxy&amp;feed=atom&amp;action=history</id>
		<title>EmbeddedPerlSitemapsProxy - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://wiki.nginx.org/index.php?title=EmbeddedPerlSitemapsProxy&amp;feed=atom&amp;action=history"/>
		<link rel="alternate" type="text/html" href="http://wiki.nginx.org/index.php?title=EmbeddedPerlSitemapsProxy&amp;action=history"/>
		<updated>2013-05-19T03:05:49Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.19.0</generator>

	<entry>
		<id>http://wiki.nginx.org/index.php?title=EmbeddedPerlSitemapsProxy&amp;diff=269&amp;oldid=prev</id>
		<title>Kolbyjack: moved NginxEmbeddedPerlSitemapsProxy to EmbeddedPerlSitemapsProxy:&amp;#32;Removing Nginx prefix from page titles</title>
		<link rel="alternate" type="text/html" href="http://wiki.nginx.org/index.php?title=EmbeddedPerlSitemapsProxy&amp;diff=269&amp;oldid=prev"/>
				<updated>2010-09-22T17:52:39Z</updated>
		
		<summary type="html">&lt;p&gt;moved &lt;a href=&quot;/NginxEmbeddedPerlSitemapsProxy&quot; class=&quot;mw-redirect&quot; title=&quot;NginxEmbeddedPerlSitemapsProxy&quot;&gt;NginxEmbeddedPerlSitemapsProxy&lt;/a&gt; to &lt;a href=&quot;/EmbeddedPerlSitemapsProxy&quot; title=&quot;EmbeddedPerlSitemapsProxy&quot;&gt;EmbeddedPerlSitemapsProxy&lt;/a&gt;: Removing Nginx prefix from page titles&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;We run a CMS hosting business. &lt;br /&gt;
&lt;br /&gt;
Each website is hosted on its own domain and for each website a sitemap.xml is dynamically generated on the fly when requested.&lt;br /&gt;
&lt;br /&gt;
These sitemaps are useful to feed your urls to the search engines. http://www.sitemaps.org/&lt;br /&gt;
&lt;br /&gt;
Normally it is the job of the site-owner/webmaster to submit these sites to the search engine. Some do and some dont.&lt;br /&gt;
&lt;br /&gt;
We wanted to serve all these sites dynamically and here is how it is done with nginx and perl module.&lt;br /&gt;
&lt;br /&gt;
Note: There might be other easier way of doing this but IANASEO.&lt;br /&gt;
&lt;br /&gt;
Goals: &lt;br /&gt;
1. A central server that lists a master-map of all domains and let the search engines spider them.&lt;br /&gt;
2. Cross-domain submitting. Domains in our master-map should allow the central server to serve the sitemap.&lt;br /&gt;
&lt;br /&gt;
Changes on robots.txt (also a dynamic script)&lt;br /&gt;
sitemap: http://sitemaps.ourdomain.com/domain-name.com-sitemap.xml&lt;br /&gt;
&lt;br /&gt;
So the robots.txt looks something like this&lt;br /&gt;
&amp;lt;geshi lang=&amp;quot;nginx&amp;quot;&amp;gt;&lt;br /&gt;
User-agent: *&lt;br /&gt;
Disallow: /cgi-bin/&lt;br /&gt;
Disallow: /tmp/&lt;br /&gt;
Disallow: /cache/&lt;br /&gt;
Disallow: /class/&lt;br /&gt;
Disallow: /images/&lt;br /&gt;
Disallow: /include/&lt;br /&gt;
Disallow: /install/&lt;br /&gt;
Disallow: /kernel/&lt;br /&gt;
Disallow: /language/&lt;br /&gt;
Disallow: /templates_c/&lt;br /&gt;
Disallow: /themes/&lt;br /&gt;
Disallow: /uploads/&lt;br /&gt;
sitemap: http://sitemaps.worldsoft-cms.info/ispman.net-sitemap.xml&lt;br /&gt;
&amp;lt;/geshi&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The domain-name.com is ofcoarse replaced with the correct name.&lt;br /&gt;
This sends all sitemaps requests to a central server running nginx.&lt;br /&gt;
&lt;br /&gt;
nginx.conf (related parts only)&lt;br /&gt;
&amp;lt;geshi lang=&amp;quot;nginx&amp;quot;&amp;gt;&lt;br /&gt;
http {&lt;br /&gt;
  include       mime.types;&lt;br /&gt;
  default_type  application/octet-stream;&lt;br /&gt;
&lt;br /&gt;
  perl_modules lib;&lt;br /&gt;
  perl_require Sitemap.pm;&lt;br /&gt;
&lt;br /&gt;
  keepalive_timeout  65;&lt;br /&gt;
&lt;br /&gt;
  server {&lt;br /&gt;
    listen       8090;&lt;br /&gt;
    server_name  sitemaps.worldsoft-cms.info;&lt;br /&gt;
&lt;br /&gt;
    location / {&lt;br /&gt;
      root   html;&lt;br /&gt;
      index  index.html index.htm;&lt;br /&gt;
      if (!-f $request_filename) {&lt;br /&gt;
        rewrite ^/(.*)-sitemap.xml$ /sitemap/$1 last;&lt;br /&gt;
        # If a file matches somethingsomething-sitemap.xml &lt;br /&gt;
        # then redirect it to /sitemap/somethingsomething&lt;br /&gt;
        # here somethingsomething will match a domain&lt;br /&gt;
      }&lt;br /&gt;
    }&lt;br /&gt;
&lt;br /&gt;
    location /sitemap {&lt;br /&gt;
      perl Sitemap::handler;&lt;br /&gt;
    }&lt;br /&gt;
  }&lt;br /&gt;
}&lt;br /&gt;
&amp;lt;/geshi&amp;gt;&lt;br /&gt;
&lt;br /&gt;
lib/Sitemap.pm&lt;br /&gt;
&amp;lt;geshi lang=&amp;quot;perl&amp;quot;&amp;gt;&lt;br /&gt;
package Sitemap;&lt;br /&gt;
use nginx;&lt;br /&gt;
use LWP::Simple;&lt;br /&gt;
&lt;br /&gt;
our $basedir=&amp;quot;/usr/local/sitemapnginx/html&amp;quot;;&lt;br /&gt;
&lt;br /&gt;
sub handler {&lt;br /&gt;
  my $r=shift;&lt;br /&gt;
  my $uri=$r-&amp;gt;uri;&lt;br /&gt;
  $uri=~ s!^/*sitemap/*!!g;&lt;br /&gt;
  $uri=~ s!/.*!!g;&lt;br /&gt;
  # now $uri has just the domain name such as nginx.com&lt;br /&gt;
&lt;br /&gt;
  my $sitemap_url=&amp;quot;http://$uri/sitemap.xml&amp;quot;;&lt;br /&gt;
  # Get the sitemap from something like http://ispman.net/sitemap.xml (this is dynamic and fresh)&lt;br /&gt;
&lt;br /&gt;
  my $sitemap_data=get($sitemap_url);&lt;br /&gt;
  # if the result does not include this string, return 404 Not found.&lt;br /&gt;
  return 404 if $sitemap_data !~ m/urlset/; &lt;br /&gt;
&lt;br /&gt;
  # if found, then cache it.&lt;br /&gt;
  my $sitemap_file=&amp;quot;$basedir/$uri-sitemap.xml&amp;quot;;&lt;br /&gt;
  open &amp;quot;F&amp;quot;, &amp;quot;&amp;gt;$sitemap_file&amp;quot;;&lt;br /&gt;
  print F $sitemap_data;&lt;br /&gt;
  close(&amp;quot;F&amp;quot;);&lt;br /&gt;
  $r-&amp;gt;send_http_header(&amp;quot;application/xml&amp;quot;);&lt;br /&gt;
  # return the cached file&lt;br /&gt;
  $r-&amp;gt;sendfile($sitemap_file);&lt;br /&gt;
  $r-&amp;gt;flush;&lt;br /&gt;
  return OK;&lt;br /&gt;
}&lt;br /&gt;
&lt;br /&gt;
1;&lt;br /&gt;
&amp;lt;/geshi&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
: Example master-map&lt;br /&gt;
&amp;lt;geshi lang=&amp;quot;xml&amp;quot;&amp;gt;&lt;br /&gt;
&amp;lt;?xml version='1.0' encoding='UTF-8'?&amp;gt;&lt;br /&gt;
	&amp;lt;urlset xmlns=&amp;quot;http://www.sitemaps.org/schemas/sitemap/0.9&amp;quot;&lt;br /&gt;
	xmlns:xsi=&amp;quot;http://www.w3.org/2001/XMLSchema-instance&amp;quot;&lt;br /&gt;
	xsi:schemaLocation=&amp;quot;http://www.sitemaps.org/schemas/sitemap/0.9&lt;br /&gt;
	http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd&amp;quot;&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain0.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain1.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain2.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain3.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain4.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain5.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain6.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain7.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain8.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
&amp;lt;url&amp;gt;&amp;lt;loc&amp;gt;http://sitemaps.worldsoft-cms.info/demo-domain9.de-sitemap.xml&amp;lt;/loc&amp;gt;&amp;lt;/url&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
...&lt;br /&gt;
...&lt;br /&gt;
... thousands of lines later ...&lt;br /&gt;
&amp;lt;/urlset&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/geshi&amp;gt;&lt;/div&gt;</summary>
		<author><name>Kolbyjack</name></author>	</entry>

	</feed>