Especially as a content management system grows to have a large amount of content, it would be nice if you could do structured link checking. One of the problems with link checking in general is what to do with the reports once you get them. Of course, for a very small site you can easily scan an entire site with tools like LinkScan ($) and Xenu Linksleuth (free, but ads are put in the reports) or even monitor 404 requests and use single page tools like the LinkChecker Firefox extension. But with large sites you can end up with reports that are hard to know where to even start fixing links. This is especially true for CMS-driven sites: the same bad link may appear in only one piece of content that is displayed throughout the site. Or you could wind up linking from lots of content items to a url (possibly outside your control) that changes.
I envision getting a report with a list of the bad links, where a user (with appropriate global rights) could indicate the correct new link which would get reflected in all content items (or left menus or other components surrounding the content) that used that link. This list could be prioritized by the cumulative page views that contained that bad link, or by the number of pages that contained that link. Another approach might be to provide a prioritized list of content items that have bad links (preferably directly linkable to edit mode of that content item. At any rate, note that we're not talking about pages here but content items or links -- the user can quickly take action that will correct links on multiple pages. A long list of pages (specific urls) with bad links are confusing, but, more importantly, aren't as quickly actionable.
Here is how normal link checking reports look and how more useful reports might look:
| Before / Existing Reports (where do you start with a report like this, where content items may drive multiple pages?) | Report indicating bad links where the user can immediately correct them (and apply the correction everywhere) | Report indicating which content items have the bad links(content items linkable to edit them directly) |
Etc. |
Etc. |
Etc. |
One possible way to implement this is to change all the urls into some logical link in your CMS. Assuming your CMS stores straight HTML rather than a more structured format, then any url the user enters could be changed to a macro (if the user could put in a hard link directly into the HTML without the system changing it, even if there was an option for creating a logical link, most users would probably just skip the logic linking). For example if the user put in this HTML:
Related items that a link repository might help with:
Of course, this would add complexity (and possible failure points) to a CMS. Do you think it would be worth it?
New sites with dynamic, interactive functionality using data from different sources and allowing the user to interact with the data are exciting to see (examples: geo.worldbank.org and carma.org). But how do we unleash this functionality so that non-programmers can create interaction like this? We have content management systems that allow more people to easily add content to sites. But I think we should be driving toward an environment where users can a) take data from a variety of sources and b) create interactive sites based on this data. Maps are the most prominent example, but interactive tables are also important. Let's review where we are now:
Here are the types of interactive functionality that I think we should be allowing non-programmers (let's call these folks "Interaction Publisher", riffing off the role of "Content Publisher") to create:
Sounds nice -- but how would this be possible? One possible step is for institutions to expose their data in a consistent manner (at least each institution exposing its own data in consistently). This would involve something of a meta-API, where you are consistent about:
Some potential ways of inching toward the goal of the non-developer Interaction Designer easily being able to publish dynamic, interactive features would be:
Here's a little chart displaying some of the ideas in this post (also see pdf version):
I'd really like your comments on this post. Specifically:
There are already various sites comparing features of content management systems (for example the CMS matrix), so this post aims to help set a framework for selecting a Content Management System (CMS). Aside from standard things to keep in mind when selecting a technology, there are some particularly important items for setting the tone of your CMS selection:
These are some of the particular factors to consider when selecting a CMS:
With everything on the web moving so fast now (who knows when Web 3.0 will be the next thing we're all moving to), consider moving to a CMS environment that will allow quick innovation and new functionality. Some specific approaches to this:
You know when it's time to move into a new house or apartment, when you look at the stuff you need to move and think "Why in the world do I have this bread machine? I haven't used this in years and I forgot I even had it." Or you dread moving your old clunker of a TV, thinking of the new fancy flat-panel TVs? Well, it's the same thing with migrating to a new system, for instance into a new content management system. Only it's harder. When you're moving and you're pressed for time, you may just start tossing stuff into boxes to be moved, even when you know you don't totally want all the stuff (one reason: you'll need to negotiate with a spouse about getting rid of something, and there's no time for that). This isn't that big a deal, since it's just moving more of the same stuff. Or, if you have a huge sectional couch that won't fit in your new place, then perhaps you can just sell it to the next homeowner. When you're moving content, you have all sorts of extra things to think about including:
In the end, a lot of this has to do with the web of information that's involved in the content of a web site. And this isn't counting the types of technical issues that would come up with any technical migration (differences in size limits for fields, encoding differences, etc.). Of course there's the issue of why you even have all this stuff to move in the first place (and the more stuff you have the more hassle it is to move). This blog entry has focused on why it's difficult to move all this content, but of course one of the morals of the story is to have less stuff in the first place. In the case of the web this would involve better governance of what goes on the web, and clearly defining what the focus of your web site should be. Hopefully, just like when moving houses, any discussion of moving content would also include discussing what stuff you need in the first place. Unlike houses, having extra or duplicate stuff doesn't just inconvenience you but it is a disservice to your users. I'll leave the issue of the old TV and desiring the new flat panel to a future post (on survivorship bias).
Now that I've been using Drupal for a month (Drupal is driving this site), I thought it would be a good time to write up my impressions of this open source CMS. Obviously this is before I know the tool in depth, and also before I become jaded or a zealot. For reference, over the past seven years I've worked with a couple custom CMS systems, both driving very large and small web sites, and I also have used Wordpress (not a CMS) for two other personal blogs. Also, my emphasis using Drupal for this site has been as a blog so I haven't fully explored all the CMS features. I really like Drupal for many reasons including:
Perhaps above everything else, it just seems that the details have been done very nicely in Drupal. Overall a site in Drupal appears to be easy to administer. These are some items that weren't as smooth: a) didn't get the Backup module to work quickly enough (was faster just to use scripts to do it rather than get the module to work), b) still don't fully understand the file upload/download environment (especially for counting the downloads), c) by default you're in raw html editing (yes, you can install a web-based HTML editor, but the ones I tried so far don't seem very useful), and d) getting the transparent logo needed for the Garland theme working quickly. A note to people currently working in a large, complex enterprise content management environment: I highly recommend playing with Drupal to get the creative juices flowing and also to work in a less constrained environment. Also, it's nice to work from a clean slate on a new site (although I've already bumped into a place where I wish I had set up the site differently in the first place). But of course working in an enterprise environment has a host of other requirements that have to be dealt with (for example standardized look across sites, security, single sign on, integration with internal repositories, existing systems, and standardized administration).