Showing posts with label search engine optimisation. Show all posts
Showing posts with label search engine optimisation. Show all posts

Sunday, 29 January 2012

Word Clouds & Spiders

A couple of weeks ago I posted a two line blog that gave an example of a word cloud together with a link to the original source text. Since then, coincidentally, word clouds have featured as graphic devices at a number of conferences that I have attended and in more than a few pieces of promotional literature that have been thrust under my nose! It seems that word clouds are becoming trendy again. Whether the fashion will last is, perhaps, questionable; but word clouds have their uses beyond trendy graphics. So my original word cloud blog has been deleted and replaced. For those that know, it is worth remembering that word clouds were also known as "tag clouds"; for those that don't, here's a heads-up.

Put simply, tag clouds, word clouds or "weighted lists" are a way of visualising the prominence or otherwise of words in a body of text. There are many uses of word clouds, including technical applications to do with web site navigation. But at the most basic and most useful level, word clouds can be used as a simple test of web site content at a time when content is ever more important for search engine optimisation [SEO] - see my August 2011 blog on Google Panda Guidelines.

Along Came a Spider...
Search engines use "spiders" to trawl web sites and rank them in various ways. Many of the ranking criteria revolve around the quality of the textual content, including the prominence of keywords that are relevant to the subject matter and might be used for Internet searches. I have always argued that rich content is a prerequisite for Internet effectiveness, whether for high search engine rankings or (far more importantly in my view) for engaging and retaining the interest of visitors. Word clouds give a snapshot of the texture of written material and, within limits, can indicate where it might be adjusted to increase its attractiveness to spiders.

A Word Cloud based on the first section of my "Destination Marketing Revisited" blog.
To view the original content, Click Here.
A Word of Warning
Strong subject-orientated copy will, in the majority of cases, produce spider-friendly content automatically. If your word cloud seems to give predominance to unimportant words, or if the most desirable key words are under-represented, it is obviously possible to fine-tune the text by reducing the occurrence of irrelevant words or increasing the frequency of the most significant. But it is dangerous to sacrifice readability to search engine optimisation - stiff and contrived content will lose readers very quickly; and search engines are not the only way of driving traffic to web sites, as regular readers of these blogs will appreciate. Finally, search engines are adept at spotting content that is heavily repetitious in the use of key words.

 A Word Cloud based on the "Marketing" page of the BFA web site.
To view the original content, Click Here.

Create Your Own Word Cloud
There are a good number of web sites that offer free word cloud creation, as a google search will reveal. The examples shown here were made using the Wordle site: click on the "Create" tab to begin. A number of graphic styles are available. Converting the result into a reproducible format is a little complicated, but the best route seems to be:

1) Click on the "Open in Window" button to generate an applet of the cloud.
2) Make sure the applet is the live window and take a screenshot (hold Alt and press Print Screen). This saves the applet on your clipboard.
3) Open a Word document and press Ctrl + v to save the applet.
4) Click on the applet, go to Paint (select All Programmes from your Start Button, then go to Accessories > Paint).
5) In Paint, go to Edit > Paste then Save. Enter a file name and select a destination (say, My Pictures in My Documents) and save as Jpeg.
6) Crop the image to eliminate screen edges.

Friday, 12 August 2011

Google Panda Guidelines: Web content has never been more important.

Earlier this year Google began rolling out its latest Panda algorithm, part of its stated aim to differentiate between 'high quality' and 'low quality' web sites. The following guidelines were published by Google in May 2011. To read in full, Click Here.

What counts as a high-quality site?
Our site quality algorithms are aimed at helping people find "high-quality" sites by reducing the rankings of low-quality content. The recent "Panda" change tackles the difficult task of algorithmically assessing website quality. Taking a step back, we wanted to explain some of the ideas and research that drive the development of our algorithms.
Below are some questions that one could use to assess the "quality" of a page or an article. These are the kinds of questions we ask ourselves as we write algorithms that attempt to assess site quality. Think of it as our take at encoding what we think our users want.
Of course, we aren't disclosing the actual ranking signals used in our algorithms because we don't want folks to game our search results; but if you want to step into Google's mindset, the questions below provide some guidance on how we've been looking at the issue:
  • Would you trust the information presented in this article?
  • Is this article written by an expert or enthusiast who knows the topic well, or is it more shallow in nature?
  • Does the site have duplicate, overlapping, or redundant articles on the same or similar topics with slightly different keyword variations?
  • Would you be comfortable giving your credit card information to this site?
  • Does this article have spelling, stylistic, or factual errors?
  • Are the topics driven by genuine interests of readers of the site, or does the site generate content by attempting to guess what might rank well in search engines?
  • Does the article provide original content or information, original reporting, original research, or original analysis?
  • Does the page provide substantial value when compared to other pages in search results?
  • How much quality control is done on content?
  • Does the article describe both sides of a story?
  • Is the site a recognized authority on its topic?
  • Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?
  • Was the article edited well, or does it appear sloppy or hastily produced?
  • For a health related query, would you trust information from this site?
  • Would you recognize this site as an authoritative source when mentioned by name?
  • Does this article provide a complete or comprehensive description of the topic?
  • Does this article contain insightful analysis or interesting information that is beyond obvious?
  • Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
  • Does this article have an excessive amount of ads that distract from or interfere with the main content?
  • Would you expect to see this article in a printed magazine, encyclopedia or book?
  • Are the articles short, unsubstantial, or otherwise lacking in helpful specifics?
  • Are the pages produced with great care and attention to detail vs. less attention to detail?
  • Would users complain when they see pages from this site?
Writing an algorithm to assess page or site quality is a much harder task, but we hope the questions above give some insight into how we try to write algorithms that distinguish higher-quality sites from lower-quality sites.