Any opinions expressed here are my own and not necessarily those of my employer (I'm self-employed).

Oct 6, 2010

Keep ASP.NET error pages out of search engines

In a production environment, users should not be presented the default ASP.NET error pages. Instead they should be offered clean, understandable error pages giving them a sensible explanation of the error, along with suggestions to continue their journey on the website. Besides usability concerns, it's also an important security practice to not leak details about application details to those who might tinker with your application!

In ASP.NET, the customErrors configuration element is used to handle error situations. However, the behaviour of the custom errors is somewhat counterintuitive, as you might end up with your error pages indexed by search engines.

How customErrors work
First a quick example of how a customError section might look like in a web.config file (this belongs under system.web):

<customErrors mode="RemoteOnly" 
    <error statusCode="404" redirect="/NotFound.aspx"/>

CustomErrors can be set to: Off | RemoteOnly | On. Off means that you'll get the default (and detailed) ASP.NET error pages when something bad happens. Remoteonly will give the default error pages when the application is accessed from localhost, but will serve your custom pages for requests not originating on the local machine. On will always serve the custom error pages.

Historically, errors would trigger a redirect to en error page — this is still the default behaviour. The path to the page that triggered the error is included as a parameter:


Starting with the .Net framework 3.51, the customErrors element includes the optional redirectMode attribute. When setting this to "ResponseRewrite", the user is no longer redirected to the error page. Instead, the error page is served "in place" on the page where the error occured. This can be advantageous, as the user can simply refresh the page to try again instead of being sent away from the webpage where she's trying to accomplish something.

Though it's important to present a professional looking error page there is also important behaviour invisible to the average end-user: HTTP status codes affecting how search engines index your site.  

HTTP status codes
HTTP status codes are fundamental to the functioning of the web. In short, they're numeric values  describing the outcome of a request. The status codes are included in the first line of the response by a webserver. Here's the most common ones:
  • 200 OK
  • 301 Moved permanently (this is a redirect)
  • 302 Moved temporarily (this is a redirect)
  • 404 Not found
  • 500 Internal server error
The 200, 301, and 404 status codes have a major impact on how search engines index your site. 200 means that they will index the page at will. 301 means that they should replace the source of the redirect with the destination address of the redirect. 404 means that they will remove the page from their index.  Google has an excellent article on the various HTTP status codes, and how they impact the Googlebot crawlers. To inspect HTTP status codes, get a tool such as e.g Fiddler or the Firefox Live HTTP Headers plugin.

When running with customErrors Off, a request for a non-existant aspx will yield a default ASP.NET "Server error, file not found" error page, correctly returned with a 404 HTTP status code. This behaviour is important, as the 404 status code indicates to search engines that the resource did not exist.

However, it all changes when the customErrors are set to RemoteOnly or On! You'll see your custom error page served, but the status code in the response will be 200. This indicates that everything went well! Search engines will consequently index your error page at will — and they will keep returning to the address to check for updates. Why the behaviour changes to return a 200 instead of a 404 is beyond me. In my opinion, it shouldn't. We'll revisit our example to show just how counterintuitive this is, we've now included the ResponseRewrite functionality:

<customErrors mode="RemoteOnly" 
    <error statusCode="404" redirect="/NotFound.aspx"/>

Note that we are specific about the 404 errors — we even refer it by it's numerical code — but the error page will still be returned with a 200 OK.

Fixing the problem
For the NotFound.aspx error page in our example, the statuscode can be set programatically in e.g. the Page_Load() event :

Response.StatusCode = 404;

This will override the status code in the response, and make your "file not found" custom error page behave correctly!

As a sidenote, there are several tricks to keep your regular pages out of the search engine indexes. Google has published several articles on how to keep stuff out of their index, check them out!

1 comment:

  1. Thanks for sharing the solution.

    Can we return 404 for path/ alias not found (example: /Path-Not-Exist/ )?


Copyright notice

© André N. Klingsheim and www.dotnetnoob.com, 2009-2015. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to André N. Klingsheim and www.dotnetnoob.com with appropriate and specific direction to the original content.

Read other popular posts