How not to use the canonical tag…
I’ve had an interesting if not exactly enjoyable time transferring a popular site onto a new host. If you have ever changed hosts, you know it can be a sometimes frustrating and patience-consuming task. This is especially true when the site has been built over many years of time.
Upon completing the move, we soon learned that there were many more 404 errors than there should have been. The old host did not provide .htaccess support *cough* Yahoo! *cough*, but happened to rewrite many of the URIs in a way that was not happening in the new environment, hence the 404′s. For example, /index.html did not exist, but there were about 60 external links pointing at this exact page, which had existed in a prior incarnation. The old server saw that /index.html did not exist, then requested index.php instead, and redirected to /. This was, in retrospect, the desired behavior.
Something I failed to consider is that different hosts request index documents in a different order. The new host first looked for /index.html, before moving on to other options. Because I didn’t immediately realize this, it ended up causing big problems.
This line in .htaccess caused an infinite loop:
Redirect 301 /index.html http://www.mywebsite.com/
As it turns out, even if the /index.html document doesn’t exist, the very act of redirecting from it makes the server think that it does. Because the server now thought /index.html existed, requesting the website root cause the browser to quickly crash. My solution was to instead redirect straight to the /index.php file.
Redirect 301 /index.html http://www.mywebsite.com/index.php
I then, in my infinite wisdom, added this tag to the top of /index.php:
<link rel="canonical" href="http://www.mywebsite.com/" />
The home page of the website was soon removed from the search-index. After all, everything is redirecting to /index.php, a totally separate URI, and a page which now said it was nothing but a duplicate of a web address that no longer existed. That is why this post is titled:
How not to use the <rel=”canonical”> tag!