A Bit of Background
Dundee is the 4th largest city in Scotland. The city is historically famous for Jute, Jam and Journalism and more recently renowned as a hub for gaming, with the likes of Lemmings and Grand Theft Auto being born here.
The city is currently undergoing a £1bn waterfront regeneration project, one of the biggest of its kind ever to take place in Western Europe, and includes the highly anticipated £45m V&A Museum.
It’s hoped the project will created almost 10,000 new jobs in the area, attract an influx of new business and tourists and boost the city’s economy which has declined over the past few decades.
4 years ago the city underwent a re-branding exercise, costing almost £75,000, which included a new logo, slogan and website www.dundee.com intended to be a portal for the city, supporting tourism and portraying all the city has to offer.
This website is likely to be more important than ever as the regeneration project is realised; but there’s a problem. The site’s had some kind of penalty and is suffering diminished visibility as a result. The purpose of this audit is to identify the problem and its contributing factors, and determine the main action points needed to get the site back on track.
We love Dundee, with myself and many others in our team being born and bred here, and we’re excited to see how the city flourishes after the regeneration and to be based just minutes from the development.
Hopefully this audit will help the council get the site back on track for the exciting times the city has ahead and play a small part in the boost our city deserves. Similarly, with so many more Panda penalties in 2014 hopefully this audit will help others with a Panda recovery by following some of these steps.
Step 1: Penalty Identification
We don’t have access to the site’s analytics data because we have no involvement with the website, but we can run the site through the Penalty Indicator tool we recently built in conjunction with FE International which will shows us historic estimated traffic levels and is automatically overlaid with Google Panda and Penguin algorithm updates (represented by blue and red vertical bars) to help ascertain what might have caused the problem.
Below you can see that prior to April 2013 the website was receiving an estimated 2500 organic visits per month which dropped down to around 300 per month and has never recovered.
Such a large month-on-month drop is indicative of a big root problem rather than, for example, the site slowly being overtaken by it’s competitors, which would show a more gradual decline.
To eliminate a concern over seasonal fluctuation we can look back as far as the site being launched which confirms a slow and steady increase in organic search engine traffic before this unnatural drop, which is even more apparent on a graph over a greater time period.
The penalty happened smack in between the announced Panda update in March and Penguin 2.0 in May of 2013 which were very close to one another and can often cause a monthly traffic view to be misleading or inconclusive.
We know that the Panda update in March was the last one pushed out manually, since then Google has stopped announcing the updates as they are being rolled out slowly across the month – this can often make identification tricky. We do however know that Penguin 2.0 was released on May 22nd.
If it was a Penguin problem then the traffic for May would be mainly unaffected because it didn’t hit until May 22nd, so at the most the site would have lost about a week’s traffic, and the drop would be more evident in June on the graph. Whilst this doesn’t entirely rule out Penguin, it does appear more likely that Panda is the culprit. We will analyse the backlink profile to help confirm this hypothesis, and check to see if there are any links that might be a cause for concern in the future.
Whilst we are fairly certain that Panda has caused the site’s drop in visibility, it is worth nothing that the site could have received a manual penalty from Google and be entirely unrelated to any algorithmic updates – but without access to Google Webmaster Tools we’ll just work on the assumption it was an algorithmic penalty to see what we uncover.
Step 2: On Site Technical Assessment (Panda Spotting)
Since we’re fairly certain there’s a Panda problem we can get stuck into the Google’s Panda recovery guidelines, and start checking for low quality signals the site is giving off.
Given that we might be dealing with duplicate pages or some kind of site misconfiguration I like to start with the number of pages we’re dealing with on the site to point us in the right direction of the biggest problems and to rule out others.
So there’s about 52,900 pages indexed in Google.
Next we can eliminate any pages from the results which don’t include www by using -inurl:www in our query.
This let’s us see if the non-www version of the site is also accidentally indexed and what other sub-domains are being indexed.
So there’s about 9,590 pages which don’t live on the the main site indexed in Google.
This search reveals a lot of pages on the sub domain of http://mobile.dundee.com
If we also remove them we can see it leaves just a single page:
We now know there’s no non-www version of the site indexed which helps narrow things down.
So far we’ve got:
52,900 pages indexed in Google.
9,590 pages of those live on sub domains.
1 page of those sub domains only is not mobile.dundee.com
Before we look at what Google has indexed on the main site let’s quickly check the content on the sub domains.
The single page on an isolated sub domain is http://events.dundee.com/banks/work/awake/video/awake-part-1.html
The page doesn’t resolve or return a correct HTTP header request nor does it have a Google cache. I’m not sure at this point why it’s indexed but it could be removed from Google via Google Webmaster Tools (GWMT) by the webmaster.
The Mobile Sub Domain
As we identified earlier, there is a mobile version of the site, on mobile.dundee.com. When I view this domain on mobile, however, it appears to look identical to the desktop site.
There doesn’t seem to be an error detecting the user agent – testing other user agents or browsing directly to this sub domain on desktop yields precisely the same content and layout as the desktop version.
This means Dundee.com has a ‘mobile’ site which is identical to the desktop site.
Whilst we’d certainly recommend developing a responsive site or a proper mobile version, in the age of full mobile browsers you can sometimes get away with one site to serve all devices. This would be a sensible fix at least for the short term, as there is precisely zero user value in having the mobile sub domain.
In fact, it is probably hurting them – their crawl budget is being wasted whilst they are running the risk of facing a penalty for duplicate content.
Let’s dig into the remaining 40k ish pages indexed in Google from the main site to ascertain if any are duplicate content or would be considered ‘thin’ pages.
XML Sitemaps provide a guide for crawlers which allows them to efficiently crawl a site’s pages. A sitemap should include every page the site wants search engines to crawl and index. Dundee.com’s sitemap currently looks like this (as of 04/03):
The sitemap is valid in that it correctly implements the Sitemap protocol (i.e. it is syntactically correct).
We can see it contains 5529 URLs.
Given we already know there’s over 40k pages from the main part of the website indexed in Google it’s clear that either:
- The sitemap’s not listing all the pages the site owners want it to (as there are 35k more)
- The robots file or meta tags aren’t telling Google not to index the other 35k pages and Google’s doing it’s job properly
At this point it’s beneficial to do a site crawl using a tool like Screaming Frog otherwise you’ll potentially need to do a lot of advanced searches and scraping within Google to establish all the pages the site contains and see which directories or sub folders are causing the biggest problems.
From a quick glance down the results of our crawl I can see lots of pages within the /events/ folder.
This isn’t necessarily a problem. They could all be perfectly valid pages with unique & substantial content so we’ll need to look into some of the results and to find out more.
A Google search shows the pages in the /events/ folder are accounting for 39,600 of the indexed pages in Google.
Remember we discovered:
52,900 pages indexed in Google.
9,590 pages of those live on sub domains.
1 page of those sub domains is not mobile.dundee.com
That means the main site has 52,900 – 9,590 = 43,310 pages
Of which we now know the /events/ pages make up the majority (39,600).
Jumping onto one of the event listings, which is representative of many of the others, there’s an image, around 100 words describing the event and some links to external sites about the event.
This is itself is fairly thin content which in moderation might not be a problem. But with the volume of them on the site and the fact they are never archived imagine how this problem is amassing over time.
Doing a search for some of the descriptive text also reveals something ugly in Google:
I’ve checked around 30 other events and the same thing has happened every time. The listing content has been added to Dundee.com AND to the council website, dundeecity.gov.uk AND sometimes to the venue website (in this case the city’s Caird Hall) AND sometimes by the event organiser to their own site.
Typically the council site is given priority and Dundee.com hidden within the ‘omitted’ results.
That’s a lot of duplication if this has occurred for every events item indexed in Google (which remember was around 39,600).
It’s very possible that Dundee.com and the Dundee council website run off of the same database rather than the information being inserted manually into each site.
Ideally each site would have it’s own unique content for each event or they combine resources and use just 1 site as the event portal to prevent this duplicate content issue.
The latter would be advisable as it would save a lot of resources creating and inputting unique content, prevent cannibalisation of traffic to whichever is deemed the preferred portal and bolster links and authority for the chosen site.
It’s likely at this point that this is the cause of the penalty due to the volume of duplication that’s been discovered, but for the sake of due diligence and to ensure we provide a full solution we’ll take a look at some of the other problems the site’s experiencing.
More Duplicate Content
There are 1,520 news pages indexed on the site.
Most news items are duplicated in Google on various sites indicating that the Dundee.com site is being scraped repeatedly or that it’s publishing press releases which are being shared around multiple publishers or possibly just copying the content from 3rd party sites without permission.
So we now have 39,600 + 1,520 = 41,120 which are suffering duplicate content issues on the site.
The robots.txt file informs search engines about which pages (and sections of a site) they can, and cannot, access. The Dundee.com site does currently have a robots.txt file and it looks like this:
User-agent: * Crawl-delay: 10 # Directories Disallow: /includes/ Disallow: /misc/ Disallow: /modules/ Disallow: /profiles/ Disallow: /scripts/ Disallow: /sites/ Disallow: /themes/ # Files Disallow: /CHANGELOG.txt Disallow: /cron.php Disallow: /INSTALL.mysql.txt Disallow: /INSTALL.pgsql.txt Disallow: /install.php Disallow: /INSTALL.txt Disallow: /LICENSE.txt Disallow: /MAINTAINERS.txt Disallow: /update.php Disallow: /UPGRADE.txt Disallow: /xmlrpc.php # Paths (clean URLs) Disallow: /admin/ Disallow: /comment/reply/ Disallow: /contact/ Disallow: /logout/ Disallow: /node/add/ Disallow: /search/ Disallow: /user/register/ Disallow: /user/password/ Disallow: /user/login/ # Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/
To debug these files we need to try and figure out if they are doing two things that they shouldn’t:
- Telling Google not to index important pages (i.e. stuff that should be in the index)
- Not telling Google to disallow secret/duplicate/thin pages (i.e. stuff that should not be in the index)
The file’s disallow directives do not accidentally restrict access to any of the sites most important pages which is good, and they also prevent access to some areas we wouldn’t want indexed which is also good.
More Duplicate (User Generated) Content
We found a User Generated Content section on the site, called ‘Dundee & Me’, which allows local companies to showcase their businesses and their work.
Not only will some of these likely add even more duplicate content to the site, from users copying and pasting their personal and company bios, but it also allows companies to tag each entry.
Whilst I applaud the notion of ‘Dundee & Me’ – a nice community area where anyone can create a profile and share things – the reality is a bit of a mess. Some postings do include a lot of unique content and clear community spirit, whilst others have taken advantage of the poor moderation to insert spammy links:
Even when this area of the site is used as intended – such as this nice collection of public art – the multitude of tag pages created result in hundreds of pages of very thin content.
In short, this is user generated content gone wrong. When it isn’t being spammed, this site is giving rise to a raft of thin content – which is damaging for SEO and user experience.
If we then look at the structure of the URL (http://www.dundee.com/work/
There are 334 pages currently in Google’s index containing the inurl: /tag/. All of these pages have basically no content on the page, unoptimised titles and meta descriptions – they do NOT want these indexed. So although they have a clear robots.txt file, they’ve missed some key disallow commands for the UGC section of the site.
They could add Disallow: /work/tag/ or as a more thorough solution they would also add <meta name=”robots” content=”noindex”> to all the tag pages.
The slower a site, the less user friendly it will be. Google have said a site’s speed directly effects it’s ranking potential but there’s also hundreds of studies showing it greatly affects conversion rates so it’s worth tacking for both reasons.
The homepage time to first byte, or TTFB, is currently 0.780 seconds so for static content – if your TTFB is more than a few hundred milliseconds – you may have some bottlenecks on your server. This could be your HTTP server not able to handle requests fast enough. Many servers will accept a connection and then hold it until it is ready to process. This backlog of requests can slow down the response time. Dundee.com’s Google page speed insights also show that the site could be a lot faster.
The page speed could be vastly improved by making a few alterations such as:
- Optimize images – Properly formatting and compressing images can save many bytes of data.
- Eliminate render-blocking – The page has 7 blocking script resources and 20 blocking CSS resources. This causes a delay in rendering your page.
- Enable compression – Compressing resources with gzip or deflate can reduce the number of bytes sent over the network.
- Minify CSS – Compacting CSS code can save many bytes of data and speed up download and parse times.
Step 3: On Site Ranking Factors
This part of the audit identifies and evaluates the numerous characteristics of Dundee.com’s pages (e.g., HTML markup, URL composition and more) which will directly influence the site’s search engine rankings.
They will be unrelated to the traffic drop we’re investigating but they’ll play an important role once the site recovers and it’s part of what we do during any normal audit so we’ve included our findings.
Whilst most pages on the site are fine, ironically one of the most important pages – the home page – has multiple H1 tags when only 1 should be used per page.
There are 3 instances of the H1 tag in the page template which surround visible text on the page but there’s also 9 more in use on the page – 1 on each of the rotating slides used in the carousel. These additional 9 are not hidden from Google whilst not visible to users though so as far as Google’s concerned there are 12 instances of H1 tags on the page.
On every sub page of the site the search box and every right hand side widget which are identical on almost every page have H2 tags round them. While not as concerning as the H1 tags above there’s no value at all in these being present instead of them being used correctly throughout the main body content and gives Google the wrong signals regarding importance.
A large amount of Dundee.com’s URLs are needlessly long and equally short in equal measure.
They often include unnecessary information making them harder to type in and identity or share such as this one below:
The above is actually more symbolic of the site’s poor architecture and planning than purely a URL problem but nevertheless it would be far easier for the business above to share their profile if the URL was just: http://www.dundee.com/jsmithdesigns/
Some URL’s lack description through being too short though such as:
instead of the more descriptive and accurate
When Dundee.com was put through validator.w3.org there was 46 errors and 11 warnings found. Whilst many of the problems are minor and will cause little issue it does highlight some of the poor code polluting and bloating the site like the empty H3 and H4 tags shown below.
It’s quite possible these have been caused by whoever edited the site rather than a problem with the theme or template but a better CMS interface or input form would prevent this kind of problem:
We found no evidence of Twitter or Open Graph markup which would be beneficial for the content being shared socially.
We found no evidence of rel=”next” and rel=”prev” being used on the paginated archive pages, like the news section, nor any canonical tags being in place which could be causing additional problems to the much larger ones already identified on the site.
HTTP Status Codes
404 pages: There’s around 50 pages on the site with 404 (not found) errors. This is not uncommon for a site of this size or age but these links should be updated to point to live pages or, in case those now deleted pages had any links pointing to them, those 404 URLs 301 redirected to live relevant resources.
403 pages: The site’s default ‘not found’ page returns an HTTP header response of 403 instead of the 404 it should return.
Domain issue: Whilst the main site lives on www.dundee.com it’s advisable that the non-www version of the domain 301 redirects to the www version. Currently the non-www just reports an error so anyone typing in the shorter URL directly into an address bar will not reach the site. It’s also likely over time other websites will mistakenly link to the broken URL causing a dead end for those users and the site missing out on links.
Each page should have a unique title that effectively summarizes the content for users and search engines and includes any search terms you are actively targeting.
Although the cut off length is actually measured in pixels it’s easier to work in text characters so aim to keep them below 70.
There are a number of problems with the title tags on the site.
Some top level landing pages, like the news page shown below, have nothing unique in the title tag and only contain the website URL and city tagline which is a duplicate of other pages on the site and not useful to users or search engines. At the very least ‘Latest News’ would be helpful to include.
There are lots of auto generated pages whose title tags far exceed the 70 character limit but they are not key pages and not worth concern.
There’s wasted opportunity on other top level pages like only using the word ‘Stay’ on the accommodation listings page instead of something more useful like ‘Recommended places to stay during your visit to Dundee’ to improve click through rates on the listing as well as relevance.
While meta descriptions don’t influence a sites ranking, they are extremely important for introducing your site to users via the search engine results page. None of Dundee.com pages have meta descriptions, which is a missed opportunity as it impacts the click-through rate the site receives.
Google automatically generates meta descriptions if none are included, which is fine on deeper pages on the site but a wasted opportunity on key pages, as illustrated below where a form submission URL is promoted instead of important information.
It would be advisable that unique meta descriptions are added to the key landing pages on the site to give the users a clearer idea of what content is on the page.
Step 4: Backlink Analysis
We’ll now take a look at the link profile for Dundee.com to establish whether an algorithmic penalty, Penguin or a manual/unnatural links penalty is likely to have also affected the site.
From a raw ahrefs link count we can see that there are over 47k links to the site as a whole, however these links are only coming from a total of 329 domain sources.
This obviously means at least 1 of those sites is linking to Dundee.com multiple times which is an immediate concern given the volume of links involved.
Not surprisingly most people are linking to the site with the anchor text Dundee or Dundee.com
The site clearly has some good quality links from Government and Academic sites such as gov.uk, Scotland.gov.uk and the local council home page.
There are also some really good press links from sites such as itv.com, bbc.co.uk and theguardian.com due to the city’s recent bid to become the 2017 City of Culture.
But there are clearly some strange ratios here with over 47,000 links coming from just 329 domains.
A large proportion of links, 41,790k are sitewide – which is equivalent to more than 88% of the site’s total links.
35,594 are coming from just 2 sitewide links on the Dundee City Council website.
Both links are in the sidebar. 1 is a link promoting the Dundee.com website and the other is for a webcam link so you can watch the citizens of Dundee go about their day to day lives.
Not intentionally spammy, but not optimal or recommended.
The next biggest sitewide link (2719 backlinks) is coming from the University of Dundee.
Which can be found in the footer of some of the internal staff and department pages.
And another sitewide link from Local Councillor Fraser MacPherson
Unsurprisingly the most linked to pages on the site are the home page and the webcam page.
The webcam page accounts for almost 76% of all links to the site and the homepage has 23% – these two pages alone have 99% of the external backlinks to the site.
Looking closer at the external links we can see that just 82 of the 7,500 pages in ahrefs have any links to them – or 1% of all pages are responsible for all the external links to the site.
We know from crawling the site earlier that in fact there’s many more pages than this so in fact just 82 of the 50k pages have links to them which is less than 0.2% of all pages are responsible for all the external links to the site.
We have used a log scale here so unfortunately you can’t see the 7000+ pages with 0 links.
Looking at the MajesticSEO data we can see that the site has a very high number of links with a very good Trust flow to Citation flow ratio.
There are very few links with a Trust Flow or Citation Flow of 0 and this gives me little concern.
We can also see there are 3 pages returning a 404 which have external links pointing to them, which I would recommend are redirected with a 301 status code to another relevant page on the site.
Have these sitewide links also caused a penalty?
It seems very irregular for a site of this type to have been involved in any of your stereotypical large scale link building (e.g. article marketing, social bookmarking etc..) and the link profile isn’t suggestive of this activity.
As many SEO’s will already know however, site-wide sidebar and footer links have been a real issue for websites, and we have written about these types of links previously and given recommendations on the best way to handle them.
There’s no anchor text manipulation evident from the sitewide links and they are from site’s with an obvious affiliation.
Without access to Google Analytics & the historical ranking reports for dundee.com it is difficult to conclude whether the drop in Search visibility in Q2 2013 is also related to a Google Penguin filter.
From the link audit you can clearly see there are a very high proportion of pages on dundee.com with no external links at all, however this ratio would get corrected somewhat if the various Panda issues were thoroughly addressed, which would likely reduce the page count on the site quite significantly.
Overall the external links to the site give little cause for concern but I would be focusing on obtaining links from a wider number of sources and to pages other than the home page.
Step 5: Usability
In general the site is pretty straightforward to use and understand, though there are some issues that probably reduce conversions and repeat visits.
Whilst the search function actually works quite nicely, the search bar itself has a couple of issues – mainly borne out of incomplete cross-browser testing. Chrome, for instance, displays the magnifying glass ‘search symbol’ on a new line, rather than after the search box as intended (other browsers display correctly).
Also, despite the fact that the magnifying glass is a CTA to activate a search, it doesn’t look sufficiently like a button, and doesn’t even use a finger rollover effect so unless you click the icon you’d never know it did anything.
Dundee.com utilises a right hand sidebar, which takes up about a third of the horizontal screen ‘real estate’, and in many cases offers a nice way to provide links to contextually relevant content.
However there are many other pages where this sidebar is entirely unpopulated, which is both a poor use of space and a poor user experience – resulting in barren looking pages like this:
It would be quite straightforward to implement a ‘default’ sidebar view that is populated with latest news stories or similar.
The fonts used on the site are nice (Helvetica > Arial > Sans Serif) the text is almost illegible it’s so small.
We’d normally use a font size of 14px as standard on any site and might be prepared to go down as far as 12px for certain fonts but at 11px most users will struggle to read large chunks of the website which will reduce the time on site, social sharing, engagement, conversions and so on (which doesn’t help the Panda issue either!).
Ironically, the accessibility page on the site is a perfect example: http://www.dundee.com/accessibility.html
Breadcrumbs & On States
The pages suffer from not including a breadcrumb trail to indicate where you are on the site and help Google present that information in it’s search results if it’s marked up properly.
Additionally the top level navigation doesn’t reflect your current location meaning on the deeper areas of the site it’s easy for users to feel lost.
Not Found (404) Page
The site’s not found page is completely empty of all content bar the header. It would make more sense for the site to promote the fact the page wasn’t found and lead the users to possible areas of interest. At the moment it might seem as if you’ve arrived on a broken/old site which isn’t in use.
Step 6: Promotability
It feels like there’s a largely wasted opportunity to make more use of the ‘ambassadors’ the site has amassed. They range from local business owners to inventors, well known musicians, hugely successful entrepreneurs and international celebrities.
The homepage promotes the most recent advocates of the city rather than the most famous ones which might seem fair but is also most likely far less effective.
The submissions and interviews from the ambassadors are very hard to read due to the font size so they probably won’t be read or shared as much, and it’s a shame the site doesn’t leverage rich media like video to convey the messages here instead of just text alone. Transcripts could be have been used to accompany the video to prevent thin page content and give the search engines something to work with.
With so many of Dundee’s ambassador’s not only being hugely active in promoting the city but also being very comfortable in front of the camera I wonder if this might have been quite easily achieved under the umbrella of a project of this size.
I’m fairly certain having welcome videos from some of the ambassadors like:
- Actor Brian Cox famous for films like X-men and the Bourne trilogy
- Presenter and national treasure Lorraine Kelly
- Chris Van Der Kuhl the entrepreneur now responsible for Minecraft adaptation on Xbox & Playstation
- Ged Grimes of Danny Wilson who wrote ‘Mary’s Prayer’ & now Simple Minds bass player
- Tom Simpson who’s a member of band Snow Patrol
Would be of great benefit to the site and of interest to users who are unfamiliar with the city but familiar with it’s ambassador’s from their commercial successes.
This audit has uncovered a number of problems with site structure and content which appear to have caused Dundee.com’s significant traffic drop. The most important observations and recommendations are as follows:
- Dundee.com appears to have suffered a Panda Penalty in April 2013.
- This has been caused primarily by the 39,600 event listings which are exactly duplicated on the Dundee City Council website. They may be manually added to both websites, which is a huge waste of time and resources, or both fed from the same database.
- It would be advisable for each site to have it’s own unique content for the same event or for Dundee to choose 1 website to be the events portal for the city’s activities.
- The UGC section of the site is creating hundreds of tag pages which are just thin content and should be noindexed from the search engines on the pages themselves. They are also being abused by companies inserting spammy links and probably require some level of moderation.
- Despite there being a mobile sub domain indexed in Google containing around 10k pages, they are just duplicates of the main version of the site, further adding to the site pollution.
- The site has the majority of it’s links from sitewide links on related websites. Whilst these are relevant and useful links, there is a small risk that this could hurt the site in the future, so seeking to diversify the link profile would be prudent.
- The site could much better leverage the cities ambassadors who are very passionate about Dundee, and use rich media to engage users and increase sharing and conversions.
Overall we hope this audit sheds some light on the issues with Dundee.com and the root causes of Panda affected websites. Whilst a lot of effort has clearly gone into producing something Dundonians can be proud of, the significant technical issues with the site are causing a sad lack of visibility. Hopefully this audit can go someway to addressing this problem and helping to become part of the solution.