Irish Times Where Art Thou?
Following on from my previous post about the new Irish Times website, I’ve been watching how Google gets on with the migration from ireland.com to irishtimes.com.
Could They Make It Any More Difficult?
The first thing I noticed when irishtimes.com went live was that www.irish-times.com was a mirror. It was even showing up in the SERP for [irish times], and you could clearly see the dupe content issues from the pages listed. Thankfully that domain has now been properly redirected.
But perhaps more worrying is the fact that Google is still returning Ireland.com in the #1 position for [irish times]:
Image of Google results for [irish times]
Am I surprised? Well not really given the gratuitous use of META refreshes and 302 redirects on Ireland.com. As I mentioned in my previous post, a request for http://www.ireland.com results in:
http://www.ireland.com/
<html><br/>
<head><meta HTTP-EQUIV=”Refresh” CONTENT=”0;URL=index.jsp”></meta></head><br/>
<body></body><br/>
</html>
http://www.ireland.com/index.jsp
HTTP/1.1 302 Moved Temporarily
Date: Mon, 30 Jun 2008 07:04:11 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Pragma: no-cache
Cache-Control: no-store
Expires: 0
Location: http://www.ireland.com/home/landing.ie
Content-Type: text/html
Content-Length: 0
Via: 1.1 www.ireland.com
Keep-Alive: timeout=10, max=99
Connection: Keep-Alive
http://www.ireland.com/home/landing.ie
HTTP/1.1 302 Moved Temporarily
Date: Mon, 30 Jun 2008 07:04:13 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Pragma: no-cache
Cache-Control: no-store
Expires: 0
Location: http://www.ireland.com/home/landing.ie?pid=0
Content-Language: en-GB
Content-Length: 0
Via: 1.1 www.ireland.com
Keep-Alive: timeout=10, max=98
Connection: Keep-Alive
Content-Type: text/illegal
The only surprise is that Google has gotten as far as it has. The snippet displayed for ireland.com appears to be the META Description, but that title certainly isn’t from the page (very often a good sign that Google knows the page is there, but cannot crawl it). And why might Google still believe that ireland.com is the Irish Times? Well? At a guess I’d say it’s anchor text related (I don’t think the old ireland.com homepage had that page title). If you check the Google Directory you can see that the DMOZ anchor is in fact ‘Irish Times’, so I’m sure that’s a factor here. Again I’m guessing, but I imagine those refreshes and redirects are seen as temporary, and Google remains to be convinced that ireland.com is not the Irish Times. This is probably a good example where requesting a change to DMOZ would be a wise step. Fixing the canonical URL issue would also be advisable given the issues I’ll look at in a short bit.
The Plot Thickens…
Whenever I think Google is confused about a page I always check the cached version. Here’s Google’s cache of ireland.com:
Images of Google’s cache of www.ireland.com
The initial links are hidden page anchors for accessibility usage. That’s fair game, and it’s perfectly acceptable to hide those links for visual browsers. But the next links contained in h1′s?
Now don’t get me wrong – I’m a big fan of using text alternatives when text content is image-based. I often add in a text node containing the text portrayed in the image and hide that node away. Google doesn’t mind this as long as the text content is representative of what’s in the image. But I think it’s rather risky for a site like ireland.com to add hidden text to their page that is not also rendered in one form or another on the page. Here’s the mark-up:
<h1 class="magic">
<a title="Return to the homepage" href="/home/landing.ie" class="home-logo"><span>Homepage</span></a>
<a href="/home/landing.ie"><span>Information for Ireland or abroad, travel, entertainment listings, sports news, games, puzzles, recipes, TV listings and more</span></a>
</h1>
The ‘Homepage’ link might pass a manual review, but I doubt the “Information for Ireland or abroad, travel, entertainment listings, sports news, games, puzzles, recipes, TV listings and more” would. It’s no where on the page, and basically hidden text. No idea why they have it there TBH, but I think it’s a little risky personally.
More Horrible Redirects
I always wonder where some of the URI constructions come from these days. You can always tell when no one from the SEO side is consulted when you see URLs like:
http://ireland.com/home/Looking_for_cheap_flights_Try_our_Find_it_fast/maxiview.ie?mx_ext_UNCLASSIFIED_uuid=/travelnow/landing.ie?afs=false
That URL comes from the Most Read list on the homepage:
Most Read items on Ireland.com
Perhaps worse still is the server response after clicking on one of those URLs:
http://ireland.com/home/Looking_for_cheap_flights_Try_our_Find_it_fast/maxiview.ie?mx_ext_UNCLASSIFIED_uuid=/travelnow/landing.ie?afs=false
HTTP/1.1 302 Moved Temporarily
Date: Tue, 15 Jul 2008 08:05:26 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Pragma: No-cache
Cache-Control: no-cache, no-store
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Location: http://ireland.com/home/travelnow/landing.ie?afs=false
Content-Type: text/html;charset=ISO-8859-1
Content-Language: en-GB
Content-Length: 0
Via: 1.1 www.ireland.com
Keep-Alive: timeout=10, max=100
Connection: Keep-Alive
Which of course doesn’t resolve to content, but instead:
http://ireland.com/home/travelnow/landing.ie?afs=false
HTTP/1.1 302 Moved Temporarily
Date: Tue, 15 Jul 2008 08:05:28 GMT
Server: Apache-Coyote/1.1
X-Powered-By: Servlet 2.4; JBoss-4.2.2.GA (build: SVNTag=JBoss_4_2_2_GA date=200710221139)/Tomcat-5.5
Pragma: no-cache
Cache-Control: no-store
Expires: 0
Location: http://ireland.com/home/travelnow/landing.ie?pid=64&afs=false
Content-Language: en-GB
Content-Length: 0
Via: 1.1 www.ireland.com
Keep-Alive: timeout=10, max=99
Connection: Keep-Alive
Content-Type: text/illegal
Really, enough already – either NOFOLLOW those links, or use the correct URLs…
What Else Don’t I like?
I could spend a long time going through what is undoubtedly a very large site. I did take a look at what Google has indexed, and how Google appears to be dealing with new content. I always knew that dealing with Google would be a serious task for such a large site. But I’m not convinced of whatever redirect-strategy they are using. I found some very odd migration of old content to the new irishtimes.com site, complete with the old theme. I’m also seeing jsessionid
variables indexed by Google – http://www.ireland.com/goingout/Bruce_Springsteen_[ ...]jsessionid=3FF79D689D230AA75EAD8956CE97DA9A?[ ...]affiliatewindow.com_uuid=23106443_irretailaffiliat.
Conclusion
While this post started out as another look at irishtimes.com, it quickly became apparent that its predecessor has quite a few SEO flaws. I’ve looked at a few above, and I could go on, but I think the pattern is clear – SEO seems to have taken a back seat when it came to ireland.com. Large site SEO is more about crawlability and internal navigation, and very often good internal linking and architecture let’s you push around site authority and pick off less competitive and long-tail keywords very effectively.
I think ireland.com is a great example of why baking in SEO at the design and development stage of any large site is essential.
Good analysis Richard. It also looks like they should update their robots.txt. Half the pages listed in it don’t exist in the first plac. I couldn’t find a sitemap, not in the normal locations anyway.
Paul
Comment by paul - blackdog.ie — July 15, 2008 @ 10:48 am
Hi Richard, Thanks for the great insight into the workings of the irshtimes website. Are you pitching for this job? Your insights are excellent but your tone is a little patronizing..
Comment by giuseppe — July 15, 2008 @ 9:11 pm
Hi Guiseppe
My tone is my tone. If you read my other posts you’ll see that’s just how I write.
I’m not pitching for any job – I often write about large well-funded sites because I have high expectations given the budgets involved.
Thanks for commenting
Richard
Comment by Richard Hearne — July 16, 2008 @ 2:27 am
hmmm last comment didn’t go throgh
Good analysis Richard. I looked at their robots.txt, and it needs to be updated, 1/2 the pages on it don’t exist anymore. I couldn’t find an XML sitemap at any of the usual locations either. It is funny to such large sites with the canonical URL problem not fixed too.
Paul
Comment by paul - blackdog.ie — July 16, 2008 @ 9:24 am
Lol. I like this bit from giuseppe reply: “Are you pitching for this job?”…
At the start I thought the same thing, but I know Richard and is not like him, looking for a job this way…
Comment by Louie Eire-Web Design — July 16, 2008 @ 1:54 pm
Thanks Louie
Weird thing is I don’t honestly know why I do these posts. I’m just so conditioned to look at websites from an SEO perspective now. Thankfully I don’t really have any need to pitch for work (x’s fingers that doesn’t put the mockers on me now). And I suppose in a way I am being somewhat patronising when I highlight some failings, but I hope that some of what I write might actually be useful to those sites I write about (in the main that is).
Rgds
Richard
Comment by Richard Hearne — July 16, 2008 @ 2:14 pm
Great postings, not patronising and very helpful to those website owners
Comment by Vaughan Belhamine — July 16, 2008 @ 9:17 pm
Your use of the Google cached copy got me thinking — I wonder if you could use http://www.archive.org in the same way?
Comment by Paul Burani, Clicksharp Marketing — July 21, 2008 @ 5:48 pm
Hi Vaughan – I hope so
Hi Paul – the archive.org cache can be very useful for picking issues from the past. I’ve seen it used to find sites that previously sold links and the like. Google will often point you to the archive of a page, although we all know that they have a much more powerful archive at their disposal.
Thanks to both for commenting
Richard
Comment by Richard Hearne — July 22, 2008 @ 6:51 am
Interesting post Richard. I checked Ireland.com just now but it gives me a maintenance notice..
“Homepage Home Page
Hello! We’re currently working hard on a few improvements.
We know know that you’re really keen to have a look around our shiny new website (we love shiny new things too), so check back in a little while. We think it’ll be worth the wait.
If you need to contact the ireland.com team, then you can email us at info@digitalworx.ie
If, on the other hand, you’re after The Irish Times, they’ve moved. You can now visit them at http://www.irishtimes.com”
Can’t be good for SEO either, taking a production instance down in such an old yellow hat busy at work under construction way?
Comment by Warren — July 28, 2008 @ 10:13 pm