My dive into the world of error pages started when I signed up for Google Webmaster Tools. Upon running the web crawl diagnostics, I found out I had nearly 12,000 “not found” pages. Twelve thousand missing pages can’t be good for page ranking!
As it turned out, the problem was due to an old Amazon script, that I had deleted from the server a while ago, but that had allowed Google to index each and every reference to products… and Google kept crawling them over and over again. After some searching and reading, I decided to face the problem and start looking for a solution. It was a good opportunity to take care of several issues at once.
Adsense and the 404 page
It is against Google TOS to display ads on error pages. Getting in trouble and having the account closed can be easily avoided with very simple coding.
Usually, a theme will display the same layout and sidebar content on all pages, including the 404 page, if your theme has a custom 404.php file. Having a custom error page has great advantages, and will keep all visitors on your site, where they can be directed, rather than the server’s default error display. If your theme doesn’t have a custom 404, you can create one as explained below.
I originally set up my blog to have ads on the sidebar, using a simple text widget. But now I needed to insert PHP code, to create a statement that said “if it is a 404 page, don’t display the ad”, so that only the other elements of the sidebar would be displayed, allowing for navigation and a consistent layout. If you need to use PHP, to the rescue comes the PHP Code Widget, which acts as an enhanced text widget, in that it can also display simple text and HTML, besides parsing PHP flawlessly.
On Bill2me.com I found a great simple way to code the statement, and used it on each individual ad block created using PHP widget, for both Adsense code and Amazon code:
<?php if (!is_404() ) { ?>
AdSense code goes here.
<?php }
?>
With this, each ad widget knows to display the ads on all pages, except for the error page. Google is taken care of, at least for this issue!
The 404 vs. 410 error page
Researching on the topic of permanently deleted files and directories, I realized that for deleted files a 410 error code is better suited. The 404 error is a generic “not found”, and Google doesn’t know if the file is just temporarily missing, if it is mis-typed or what else. So it will keep looking for it. The 410 code, however, will tell the crawler that the file is permanently gone, for good.
The .htaccess file
You can have the server return a 410 for specific files and directories, by adding a simple line to your .htaccess file. I found great examples on dive into mark. The one I used, was the simple Redirect method, with the path to my directory:
Redirect gone /example-directory/
Adding this line to the .htaccess file, will return a 410 Gone error on all variations of URL within that directory. This took care of the nearly 12,000 indexed URLs.
For individual files, the principle is the same:
Redirect gone /path/to/file.php
Calling your one custom page for all errors, while displaying the correct status in the HTTP headers
The Codex is a great resource for creating a custom 404 page in Wordpress. While you could create separate error pages, for example a 404.php and 410.php, with some coding taken from the codex, you can use the same custom page for several different errors. To me, this keeps things simpler, and the error status is still returned correctly in the response headers.
Back to the .htaccess file
To make sure that your server returns your custom error page, and not the default ugly white page with the warning, you set the instructions on your .htaccess file.
ErrorDocument 404 /index.php?error=404
You could create separate files and call them accordingly, but I decided to use just one error page for all (404.php).
ErrorDocument 400 /index.php?error=404
ErrorDocument 401 /index.php?error=404
ErrorDocument 402 /index.php?error=404
ErrorDocument 403 /index.php?error=404
ErrorDocument 404 /index.php?error=404
ErrorDocument 405 /index.php?error=404
ErrorDocument 406 /index.php?error=404
ErrorDocument 407 /index.php?error=404
ErrorDocument 408 /index.php?error=404
ErrorDocument 409 /index.php?error=404
ErrorDocument 410 /index.php?error=404
ErrorDocument 411 /index.php?error=404
ErrorDocument 412 /index.php?error=404
ErrorDocument 413 /index.php?error=404
ErrorDocument 414 /index.php?error=404
ErrorDocument 415 /index.php?error=404
ErrorDocument 416 /index.php?error=404
ErrorDocument 417 /index.php?error=404
ErrorDocument 422 /index.php?error=404
ErrorDocument 423 /index.php?error=404
ErrorDocument 424 /index.php?error=404
ErrorDocument 425 /index.php?error=404
ErrorDocument 426 /index.php?error=404
ErrorDocument 500 /index.php?error=404
ErrorDocument 501 /index.php?error=404
ErrorDocument 502 /index.php?error=404
ErrorDocument 503 /index.php?error=404
ErrorDocument 504 /index.php?error=404
ErrorDocument 505 /index.php?error=404
ErrorDocument 506 /index.php?error=404
ErrorDocument 507 /index.php?error=404
ErrorDocument 510 /index.php?error=404
I have yet to see if this will work with all errors, but I have tested it to work fine with the two most needed ones, that is 404 and 410. In fact, you can even just limit yourself to those two errors, and get rid of the call to the other ones. My reasoning behind including all of them was, “Since I’m at it…”
The custom 404, catch all, page
If you want to test my error page, go ahead and try a broken link, which will return a 404 “Not Found” error, and a no longer existing page, which will return a 410 “Gone” error. Both calling the same error page, with the proper status headers for each case.
In order to display the whole range of errors on the same error page, I have taken the code from the Codex Advanced 404.php Example, with just some minor changes.
- On the line that starts with
// begin the output buffer to send headers and response
I had to get rid of ob_start();, because it caused a misconfiguration. Without it, it works fine. I guess it depends on the plugins you are using.
- On the part that prints out the page, starting with
<?php get_header();?>
I continued with the layout of my theme, and added helpful links and text, basing myself on the existing 404 page I already had.
The way you can adapt it to your theme, if you don’t have a custom 404 file, is to make a copy of your theme index.php and rename it 404.php, then take all the loop part out, and add some pertinent text there. What you end up with, would then be adapted to the Advanced Example code, to match your theme more.
The part of the code that displays your page, along with the status code is:
<?php get_header();?>
<div id="content">
<div class="post">
<h1><?php _e("$AA_STATUS_CODE $AA_REASON_PHRASE"); ?></h1>
<?php if(function_exists('aa_google_404')) aa_google_404(); ?>
</div>
</div>
<?php get_sidebar(); ?>
<?php get_footer();
?>
This way, you will end up with a custom error page that will keep your visitors on your site, while handling several kinds of errors without Adsense ads, and a statement on your .htaccess file telling Google when a file has been permanently deleted.