Multiple Websites and Robots.txt, no problem!

0
Comments
2
Votes
Login to vote

Well, I should have included a disclaimer that this developer tip requires the use of IIS 7 and the rewrite module. If you're unfamiliar with the rewrite module, you can find a brief introduction here.

I run into this problem when creating multiple websites within a single Kentico installation.  I had a tough time finding the right solution for my application. I did find a solution in the Kentico forums that suggested adding a robots.txt file as a file type in the CMS Desk and using custom url extensions. More details of this solution can be found here. But, I also found that if your website isn't running under Integrated Mode, using the custom url extensions gets a bit messy. Therefore, I decided to look further into using the rewrite module.

The rewrite module is incredibly easy to use. You're basically using the web.config and regular expressions to utilize its functionality. It's very powerful, but for my solution, the implementation was simple. The following is an example from my sandbox website, Kentio Angler. For each additional website, just add an additional rule.

...
<system.webServer>
    <rewrite>
      <rules>
		<rule name="Kentico Angler Robots" enabled="true" stopProcessing="true">
          <match url="^robots.txt$" />
          <conditions>
            <add input="{HTTP_HOST}" negate="true" pattern=".*?www.kenticoangler.com$" />
          </conditions>
          <action type="Rewrite" url="kenticoangler-robots.txt" appendQueryString="false" />
        </rule>
      </rules>
    </rewrite>	
</system.webServer>
...

In all of the searching for other solutions, I also found that you can specify your sitemap within your robots.txt file. Now that I can create a separate robots.txt file for each website, this becomes a very easy implementation.

For KenticoAngler.com I just needed to add the line:

sitemap: http://www.kenticoangler.com/sitemap.xml

to the top of my kenticoangler_robots.txt file.  Now all I need to do to keep that nice .xml extension is add another rule to the web.config:

...
<system.webServer>
    <rewrite>
      <rules>
		<rule name="Kentico Angler Robots" enabled="true" stopProcessing="true">
          <match url="^robots.txt$" />
          <conditions>
            <add input="{HTTP_HOST}" negate="true" pattern=".*?www.kenticoangler.com$" />
          </conditions>
          <action type="Rewrite" url="kenticoangler-robots.txt" appendQueryString="false" />
        </rule>
        <rule name="Site Sitemap" enabled="true" stopProcessing="true">
          <match url="^sitemap.xml$" />
          <conditions>
            <add input="{HTTP_HOST}" pattern=".*?www.kenticoangler.com$" />
          </conditions>
          <action type="Rewrite" url="/CMSPages/googlesitemap.aspx" appendQueryString="false" />
        </rule>
      </rules>
    </rewrite>	
</system.webServer>
...

Doing this also gives me the option of creating a separate sitemap ASPX file for each website. For example, I can copy the current ASPX file "/CMSPages/googlesitemap.aspx" and create "/CMSPages/kenticoAnglerSitemap.aspx." Within that file, I can specify different Document Types to be included within each generated sitemap:

<cms:GoogleSitemap 
runat="server" ID="googleSitemap" 
ClassNames="CMS.MenuItem;KenticoAngler.Fish,KenticoAngler.FishingLocation" 
TransformationName="CMS.Root.GoogleSiteMap" 
CacheMinutes="0" OrderBy="NodeLevel, NodeOrder, NodeName" />

Now I just need to modify my rule within the web.config and presto!

 
Posted by Jason Ellison on 7/27/2011 4:43:06 PM
  
Comments
Blog post currently doesn't have any comments.