Submitted by David Hobbs on 31 August 2009 - 2:02pm

It seems so simple. You've got press releases that are clearly tagged to neighborhood (let's say the two possible neighborhoods are Capitol Hill and Atlas District). The Atlas District page should obviously only have Atlas District news, so you create a a section on the Atlas District page that lists the most recent three press releases there. Your web developer whips something like this up quickly (examples from the excellent local blog Frozen Tropics):
Possible Issues
Seems easy enough, right? Sometimes the straightforward approach may be fine (especially for small sites), but you could wind up with something more like this if you're not careful:

Here are some of the potential issues with larger sites:
Drafts and embargoed material
"this should not appear anywhere, in any channel, until published"
Let's say you're about to post a press release containing the menu for a new restaurant in the Atlas District, and you've agreed to post it after 7pm tonight. You'll be working on a draft beforehand so that it's ready to go at 7:00. Obviously, the press release shouldn't appear until after approved time. This is more significant an issue than it appears, since if you start exposing APIs and other means of sharing your content, the same rules should apply there (rather than developers recreating the rules, and potentially introducing errors, every time).
Editorial decisions
"yeah, but I don't want it on my page"
A press release is published that is related to both the Atlas District as well as Capitol Hill. Perhaps it's about a bicycle race that will result in street closings in Capitol Hill but only parking in the Atlas District. The owner of the Atlas District page doesn't think it's significant enough to appear on the Atlas District page. This would be a case where the tagging to Atlas District is correct, but there is a valid editorial decision to not include it on the Atlas District page (perhaps there's another separate event there that should be in the top three). In this case, the press release should not be retagged to remove Atlas District, since for some purposes (such as enterprise search) you will want the correct tag.
Bad Tagging
"this tag is just wrong"
This one is virtually impossible to avoid when dealing with a large group of people submitting content (although see a related metator discussion about ways to improve this). Let's say that a new person who does not know DC very well arrives, and mistakenly tags something to Capitol Hill instead of the Atlas District (perhaps mixing up 401 H St NE and 401 H St SE). Note that this is very different than the editorial decision issue, although at first blush they seem similar. In this case, the tagging is wrong and should be corrected (or, in the case of automated tagging, the rules should be changed).
Multilingual Issues
"don't show me partial results in another language"
A variety of issues can occur when pulling content in many languages, especially when, as is usually the case, different pieces of content are in different languages. You can end up with too little new content (if you are displaying a page with too little content in that language), or with unnecessary duplicate content (see Interleaving Languages).
Broadcasted content
"I need this important information on all pages of the site"
If you have a lot of publishers and content, you may sometimes have content that should appear in all pages (broadcasts), regardless of what neighborhood the news is about (let's say a press release about Washington, DC overall and not specific to a neighborhood). What you *don't* want to do (but may indeed do in a crisis if this wasn't planned for) is tag content to all neighborhoods, for example, to have content appear there although it is not correct to tag it so.
Appearance of Timeliness
"a year old press release isn't 'current news'"
If you end up with a lot of automated pages (for instance if you cover 30 different neighborhoods), then it's easy to wind up with the block that says "Current News" that has very old content. In addition, if you are displaying events then events that are far in the future could overwhelm an event happening tomorrow.
What to do about it?
In future blog posts, I hope to cover approaches to avoid these issues, but in closing I thought it would be helpful to list some high-level pointers:
- Clearly articulate how you want your automatic pulls should work, as early in your process as possible.
- Don't think of each block in isolation, but try to implement things in a consistent manner (for instance, by only having page blocks behave in a few different ways)
- Similarly, consider whether developers should have control over all aspects of each block, or whether much of the aggregation should only be available through a consistent API
- Be mindful of the issues above when designing your page/block behavior and training of those that will be tagging.
As always, please provide any comments at HobbsOnTech or on Twitter at @jdavidhobbs.
Bookmark/Search this post with
Submitted by David Hobbs on 13 August 2009 - 9:48am
After a long hiatus, I plan on focusing my blogging energies back on the Hobbs On Tech blog. Looking at my posts over the last couple years on the Hobbs On Tech and WelchmanPierpoint sites (see full list of articles), I see some themes: CMS Migration, Internal CMS Product Management, and Content Re-use / Large Site Issues. As I reflect on future blog entries, I'm stepping back and thinking about what blog posts have been the most successful in my opinion.
Bookmark/Search this post with
Submitted by David Hobbs on 19 December 2007 - 9:25pm
Especially as a content management system grows to have a large amount of content, it would be nice if you could do structured link checking. One of the problems with link checking in general is what to do with the reports once you get them. Of course, for a very small site you can easily scan an entire site with tools like LinkScan ($) and Xenu Linksleuth (free, but ads are put in the reports) or even monitor 404 requests and use single page tools like the LinkChecker Firefox extension. But with large sites you can end up with reports that are hard to know where to even start fixing links. This is especially true for CMS-driven sites: the same bad link may appear in only one piece of content that is displayed throughout the site. Or you could wind up linking from lots of content items to a url (possibly outside your control) that changes. I envision getting a report with a list of the bad links, where a user (with appropriate global rights) could indicate the correct new link which would get reflected in all content items (or left menus or other components surrounding the content) that used that link. This list could be prioritized by the cumulative page views that contained that bad link, or by the number of pages that contained that link. Another approach might be to provide a prioritized list of content items that have bad links (preferably directly linkable to edit mode of that content item. At any rate, note that we're not talking about pages here but content items or links -- the user can quickly take action that will correct links on multiple pages. A long list of pages (specific urls) with bad links are confusing, but, more importantly, aren't as quickly actionable. Here is how normal link checking reports look and how more useful reports might look:
| Before / Existing Reports (where do you start with a report like this, where content items may drive multiple pages?) |
Report indicating bad links where the user can immediately correct them (and apply the correction everywhere) |
Report indicating which content items have the bad links(content items linkable to edit them directly) |
- http://badlinkone.com is referenced on http://example-site.com/page1, http://example-site.com/page35, and http://example-site.com/page102
- http://badlinktwo.com is referenced on http://example-site.com/page1, http://example-site.com/page1023, http://example-site.com/page2439, http://example-site.com/page5192
Etc. |
Etc. |
Etc. |
One possible way to implement this is to change all the urls into some logical link in your CMS. Assuming your CMS stores straight HTML rather than a more structured format, then any url the user enters could be changed to a macro (if the user could put in a hard link directly into the HTML without the system changing it, even if there was an option for creating a logical link, most users would probably just skip the logic linking). For example if the user put in this HTML:
<a href="http://hobbsontech.com>Hobbs On Tech</a> then the system would replace it with
!link(123," _fcksavedurl="http://hobbsontech.com>Hobbs On Tech</a> then the system would replace it with
!link(123,"Hobbs On Tech")and put in its link repository that link 123 was
http://hobbsontech.com. When the page was generated then the correct link could be replaced in the HTML (so of course the end user's browser should never see the "123" in the HTML). If the page linked to was in your CMS, then the macro could be different and just indicate the unique key for the content item being pointed to (this would depend on whether the context that the content appeared in was relevant). For example:
!cms_item(123,"Hobbs On Tech") Related items that a link repository might help with:
- Reporting on content use. A link repository would allow other interesting reporting, such as the most linked-to content items in your repository.
- Easily move content. In some cases, it may be easier to move content if you had a link repository. For instance, you may sometimes need to restructure your site resulting in the links changing. With a link repository, you could automatically change all the links so that the move did not result in broken links (of course this would work best for intranet sites where there were limited links outside your control to your content).
Of course, this would add complexity (and possible failure points) to a CMS. Do you think it would be worth it?
Bookmark/Search this post with
Submitted by David Hobbs on 12 December 2007 - 1:34am
New sites with dynamic, interactive functionality using data from different sources and allowing the user to interact with the data are exciting to see (examples: geo.worldbank.org and carma.org). But how do we unleash this functionality so that non-programmers can create interaction like this? We have content management systems that allow more people to easily add content to sites. But I think we should be driving toward an environment where users can a) take data from a variety of sources and b) create interactive sites based on this data. Maps are the most prominent example, but interactive tables are also important. Let's review where we are now:
- We have sites already applying Google maps and other interactive functionality to various data sources (examples above).
- Programmers have resources/examples/documentation for creating these types of sites (see Programmable Web for example).
- Various APIs have been exposed for interacting and using data (examples).
- We have tools like Yahoo Pipes that allow advanced users (probably not needing full-blown programmer skills) to create mashups. That said Yahoo Pipes is now focused on consuming/dealing with RSS feeds (the Fetch Data Module is supposed to more general XML, I had problems getting it to do so -- if you look at examples using DC crime data, you see it's RSS with some customization). In addition, this is a hosted solution, so you're at the mercy of Yahoo if you host a mashup with them (I noted Yahoo Pipes having problems accessing feeds intermittently even in my brief testing).
- There are probably other similar examples of specialized tools, but I know of Swivel, which allows you to create your own graphs of data.
Here are the types of interactive functionality that I think we should be allowing non-programmers (let's call these folks "Interaction Publisher", riffing off the role of "Content Publisher") to create:
- Interactive data tables. Interaction Publisher should be able to point at one (or multiple) data source, and indicate which columns/attributes to display in a table. The Interaction Publisher should also indicate which attributes should be selectable (in pulldowns for example) be the end user. Of course some theming / design and annotation should be possible.
- Interactive maps. Interaction Publisher should be able to point at a data source, the attributes containing the locations, and what data to show for each location (along with the extent of the default map and formatting). Also, please can we get rid of the points / waypoints / circles that indicate arbitrary points that are used to indicate data for a large area (for example, a pointer to the capital for a country), and instead highlight the whole area (for example, the whole country). Ideally the Interaction Publisher will be able to indicate further interaction with the map (for example, displaying different layers of a map -- if not full-blown layers, then at least indicating different sets of waypoints to display).
- Custom data. The Interaction Publisher should also be able to easily publish their own data/content, and pull their data into an interactive feature (for instance, this could even be a simple search on a little database / resource center the user has). An extension of this would be including some mechanism for overriding other data sources data points (of course this should somehow be indicated on the map/table so it isn't misleading).
- Wizard-like functionality. The Interaction Publisher should not have to resort to XPATH, XSL, or programming in PHP / Perl / whatever.
Sounds nice -- but how would this be possible? One possible step is for institutions to expose their data in a consistent manner (at least each institution exposing its own data in consistently). This would involve something of a meta-API, where you are consistent about:
- Attributes that can be queried. Perhaps the list would be just topics and countries, for example. The topics lists should be something that the outside world will understand rather than an organization-centric list. If you have multiple topics lists, then it would be preferable if all systems were moved to a single topics list (even if that meant two topics lists per system).
- Simplicity and consistency in APIs. Perhaps all your XML APIs are at http://xml.example-domain.com/apis/ (with an html page just listing all the APIs there) and then APIs to different systems like http://xml.example-domain.com/api/documents and http://xml.example-domain.com/api/web with example calls like http://xml.example-domain.com/api/web/api-version=1&topic=agriculture.
- Consistent exposure of non-standard attributes. The issue of consistent query parameters was covered above -- this means that all systems are queried on the same parameters. But of course some systems will need to provide other attributes (such as, say, "Population"). This could be done in a custom namespace in RSS as the DC crime data (see xml) does in its Atom feed (which Yahoo Pipes, for example, can consume). This could be documented, and the consumer of the data could handle this.
- Custom databases would also preferably comply. Perhaps there could be an http://xml.example-domain.com/api/core/ for institutionally, centrally supported repositories and http://xml.example-domain.com/api/special/ for one-off databases. This would still allow easy access of data by Interaction Publishers.
Some potential ways of inching toward the goal of the non-developer Interaction Designer easily being able to publish dynamic, interactive features would be:
- Start by using javascript libraries. There are several javascript libraries out there (examples: Dojo, mootools, Prototype / Scriptalicious), but most seem to be too low-level (concentrating on opening/closing panels, transitions, and the like) to be useful for interactive data features. Possibly a library that has higher level features including interactive table such as EXT JS could be used as a first step. It would require touching some code, but perhaps a CMS, for example, could include in its documentation with code snippets indicating what needs to be replaced (for example, where to put in the url to the source XML).
- Create some simple wizards in CMSes. So that we aren't relying on, for example, Yahoo Pipes for hosting our interaction, we may wish to start including simple wizards in our CMSes. For example, one could be for interactive tables that just had one data source and three columns.
- Push for stronger hosted interactive feature builders. For example, Yahoo Pipes perhaps could include some of the features mentioned in this email (for example, a tool for creating interactive maps, or a tool for creating a pulldown of options to drive a Google map.
Here's a little chart displaying some of the ideas in this post (also see pdf version):

I'd really like your comments on this post. Specifically:
- Is the role of Interaction Publisher important?
- How could we enable this role?
- What ideas above do you think would work and which would not work?
- Is their a need for a separate generic standard XML from RSS feeds, or should an institution's RSS just be extended to include custom portions?
Bookmark/Search this post with
Submitted by David Hobbs on 10 December 2007 - 1:27pm
There are already various sites comparing features of content management systems (for example the CMS matrix), so this post aims to help set a framework for selecting a Content Management System (CMS). Aside from standard things to keep in mind when selecting a technology, there are some particularly important items for setting the tone of your CMS selection:
- Standardization / Governance. Is one of your objectives to standardize the look and feel of your site, or to try to ensure there's a consistent quality across your site? If so, then before you start moving into the new system then deciding who will make the decision of what goes up and how the decisions will get made is important. Sure, an advantage of a CMS is that anyone can publish, but this can lead to inconsistent quality. I'm not just talking about how workflow: for instance, who makes the call about adding a whole new site section?
- Stakeholder buy-in of objectives. This one is of course part of any technology decision, but some key factors in deciding about a CMS are: a) if you've decided to standardize aspects of your site, make sure everyone is bought in (otherwise people will try whatever they can to get out of the standard), b) if people's jobs are going to change (for instance, people that are doing hands-on HTML coding may not be doing that anymore), then is everyone clear on this?
- Envision key use cases. After you're in the middle of migrating your systems, you may lose sight of why you undertook this in the first place. Laying out key use cases in advance allows you to both not loose sight of the goals and also let's you more easily claim victory. Key use cases might be something like "Will be able to allow any staff member to publish a piece of content, resulting in it automatically appearing on the home page as well as the relevant country page, and also appears in country's RSS feed and email alerts". Of course, you also should list key use cases that you don't want to go away like "Compare statistics across different areas of the site in a consistent manner."
- Make sure everyone understands the complexity of a move to a new system. See this post that lists some of the complexity.
These are some of the particular factors to consider when selecting a CMS:
- Tagging. For a large institution, you may have issues keeping consistent quality in your tagging (and you may wish to consider an automated concept extraction tool to help in the tagging). At any rate, you will want to think about a method of tagging that will work for everyone (and ensure that your system will support this).
- multilingual/internationalization support. See this page that describes different levels of multilingual support. Some more advanced types of features to consider are Administrative Title and Interleaving Languages.
- distributed or centralized content entry input. This relates to the issue of standardization above.
- community/support.
- multiple site support. If you need to have multiple sites, what kind of functionality do you need? For instance, does content need to flow between sites? Do the different sites need to enforce a consistent look/brand?
- integration with other systems or all-in-one. A key decision will be how you are going to integrate with other systems, and, if integration is not as important (for instance for a smaller organization), then ensure that your solution supports the different functionalities you need.
With everything on the web moving so fast now (who knows when Web 3.0 will be the next thing we're all moving to), consider moving to a CMS environment that will allow quick innovation and new functionality. Some specific approaches to this:
- Try to pick a CMS that is innovating quickly. Of course, what you really want is to pick what CMS will be a winner in the future, but the best we can do now is pick a CMS that is quickly adding new features. Looking at lists like Joomla's extensions page for any CMS that you're interested in should help with this. Of course, it needs to be easy to add any new modules/extensions when they are released.
- ease of upgrading to new versions of the core CMS. Obviously hosted, SaaS solutions have an advantage here.
- ease of writing your own new functionality. Would the CMS allow your team to program (in some lightweight language like PHP for example) their own new functionality? If you don't have the skillset, is there a pool of developers outside your organization who could help? Is there useful documentation on how to write your own new functionality?
- support to expose/share data. We have RSS as a mainstream feed now, but what about richer XML exposed for more structured data? More and more, we'll need to support people outside our organizations utilizing our data to write functionality on their own sites, combining your data with other organizations' data.
- integration with outside systems. If a CMS already has integration with other types of systems (for instance, stats, newsletters, email alerts, membership databases, etc), then it may be easier to move to future leaders in these different spaces.
Bookmark/Search this post with