Readability, Uniqueness, Hackability and Meaning in URL Design
Note: This article was written for a previous version of the site and is likely out-of-date.
Introduction
URL design is arguably one of the most important areas of website development. Not only do URLs generally have huge visual priority in web browsers but they’re also shown on search result listings and get used for matching search terms. Let’s not forget the usability factors, what does a bunch of seemingly meaningless query strings and numerical database keys tell your users about where they are in the site? This is just the tip of the iceberg when it comes to reasons for investing resources in developing decent URL schemas, yet it’s only in the last couple of years that we’ve seen the emergence of web development frameworks that truly put an emphasis on them. In the slower-to-upgrade enterprise world most sites are still running on frameworks that produce some truly ugly URLs.
The next problem is the URLs people choose once they decide to switch away from query parameter infested schemas to ones that use readable URL segments to identify pages. Simply put, just converting your parameters to friendly pieces of text isn’t enough; you may get improved search rankings from your new found keywords but all you’ve really done is disguise the problem and search rankings should never be your primary aim in URL design.
4 Principles of URL design
Having read a number of articles on the subject as well as exploring my own views on the matter I’ve come up with what I believe to be the 4 most important aspects of URL design:
URLs must be Readable
Just by reading a URL a user should be able to make a fairly good guess as to what they will find if they visit it. Titles (converted to a readable slug format so that those nasty %20 things aren’t visible everywhere) should be used instead of numerical ids and the URL should make it clear whereabouts in the overall site structure the resource is.
Pages with unique content must have Unique URLs
A lot of factors come into play here, the first is that the same content shouldn’t have more than one URL, you should choose your preferred URL for each page on your site and if a second is required it should simply be done as a permanent redirect. A search engine will follow the redirect and only index the canonical URL. This applies to the www sub- domain which the majority of websites have set up as optional. Use Apache’s mod_rewrite (or equivalent) to add a permanent redirect that forwards the request to the non-www URL (or the other way round if you really want the www part). Localisation should also include the locale somewhere in the URL so that the preferred localised version can be bookmarked (not everybody has their browser configured to use their preferred locale). If a page has unique content it must not rely on sessions or cookies to load this information otherwise it will be invisible to search engines, Brad Fults discusses multilingual URL design in “Designing URLs for Multilingual Web Sites”, his conclusions may not match your own but the article does an excellent job explaining some of your options.
URLs must be Hackable
This follows on from readability and the idea that a URL is not only a location but also a map – much akin to breadcrumb navigation (“Breadcrumb Pattern at Yahoo”). One of the main features of successful breadcrumb navigation is that it doesn’t represent the route you took to get to the page but rather the route home or to other pages. Every URL is constructed from a number of segments separated by forward slashes, for a URL to be hackable the user must be able to repeatedly remove the last segment and also arrive at a valid page that makes sense within the context of the URL. The user must also be able to swap in alternative segment that make sense, like changing “2007” to “2006” or changing “reviews” to “news”. This sounds straightforward enough, but the biggest stumbling block is introducing new URL segments to resolve namespace conflicts (what if someone gives an article a title that causes conflict with an existing URL?), this is a bad thing because it means you now have a URL (everything up to and including your new segment) that doesn’t have a page behind it. The solutions range from refactoring your URL schema to simply preventing new content from using existing URLs programmatically.
URLs must be meaningful
Put simply, every single part of your URL should not only mean something to the user but also to your system. Let’s say that instead of just using your article title you’re using the numerical id as well, your system may not actually care what the text is as long as the id is right. This violates the idea of unique URLs because now every single article effectively has unlimited URL possibilities. Similarly, keywords shouldn’t be stuffed into the URL if they have no actual effect on what page is returned by the system. If a user gets any part of a URL wrong they should be served a 404 page which ideally would list the possible URLs the user may have been looking. By removing useless parts you also ensure that your URL is only as long as it needs to be, long URLs almost always become unclickable if put in emails because readers break the line before the running the algorithm that converts the URLs to clickable links.
Conclusion
The 4 principles outlined above represent my condensed findings on how ideal URL schemas should be constructed, in reality it’s not quite as simple as following 4 relatively straightforward guidelines (and there are probably a number of factors I’ve overlooked that may not fit within the 4 areas) but my opinion is that URL design is easily important to justify the work that may go into programming a system that allows good design. In my experience some web development frameworks make it unreasonably difficult to develop decent URLs whilst others can even make it enjoyable, this ease of URL development should be considered an important factor in framework choice.
I have deliberately refrained from mentioning SEO directly until now, I have no objections to SEO but I feel that if any design decision is made purely for SEO purposes you risk adversely affecting the user experience. However, applied correctly these principles also go hand-in-hand with optimising your URLs for search engines. In fact, you can find the majority of what I’ve said in various SEO articles just with different motivations behind the decisions. My point was to emphasise the importance of proper URL design and highlight that even if your site is so successful that you don’t need to worry about SEO you still need to worry about the user experience and therefore URLs.
Originally published on .