Submitted by David Hobbs on 25 April 2008 - 3:22pm
But isn't it all data? Although web content certainly is an important type of information available on a web site, Data needs to be treated differently. Here I'm talking about Data with a capital D -- I thought the Wikipedia description was good: "Data refers to a collection of organized information, usually the results of experience, observation or experiment, or a set of premises. This may consist of numbers, words, or images, particularly as measurements or observations of a set of variables." Here are some of the ways that Data is different:
- People expect Data to be available in different Formats.
- Users want to manipulate the Data.
- You don't totally control your Data, since it is available in different Channels.
There are several implications of this including:
- Formats. You may wish to standardize the formats that your data is available in. Is all your data always available in csv (if that's what you standardize on)? This includes both the formats themselves (Excel, Stata, etc) and also the method by which the data is requested. For instance, is there one place that users can directly get all your data? Directly doesn't mean some thin layer with links to databases each doing its own thing. An example consistent format would be a web service with a published set of parameters by which the data could be requested. Ideally, all the institutions data would be available from this one web service.
- Manipulation. Sometimes people just want to see your data, but usually they will want to manipulate the data. By providing your data in consistent formats, then it will be easier for your users to utilize your data. Other users will expect that *you* provide the tools for manipulation of your own data.
- Channels. Ideally you will work to directly feed data to primary channels. For instance, if you feed data directly to services like Swivel then you both get more use out of your data and also can ensure your data is available in its highest quality (not watered down by other people copying and pasting your data, for example).
Bookmark/Search this post with
Submitted by David Hobbs on 19 December 2007 - 9:25pm
Especially as a content management system grows to have a large amount of content, it would be nice if you could do structured link checking. One of the problems with link checking in general is what to do with the reports once you get them. Of course, for a very small site you can easily scan an entire site with tools like LinkScan ($) and Xenu Linksleuth (free, but ads are put in the reports) or even monitor 404 requests and use single page tools like the LinkChecker Firefox extension. But with large sites you can end up with reports that are hard to know where to even start fixing links. This is especially true for CMS-driven sites: the same bad link may appear in only one piece of content that is displayed throughout the site. Or you could wind up linking from lots of content items to a url (possibly outside your control) that changes. I envision getting a report with a list of the bad links, where a user (with appropriate global rights) could indicate the correct new link which would get reflected in all content items (or left menus or other components surrounding the content) that used that link. This list could be prioritized by the cumulative page views that contained that bad link, or by the number of pages that contained that link. Another approach might be to provide a prioritized list of content items that have bad links (preferably directly linkable to edit mode of that content item. At any rate, note that we're not talking about pages here but content items or links -- the user can quickly take action that will correct links on multiple pages. A long list of pages (specific urls) with bad links are confusing, but, more importantly, aren't as quickly actionable. Here is how normal link checking reports look and how more useful reports might look:
| Before / Existing Reports (where do you start with a report like this, where content items may drive multiple pages?) |
Report indicating bad links where the user can immediately correct them (and apply the correction everywhere) |
Report indicating which content items have the bad links(content items linkable to edit them directly) |
- http://badlinkone.com is referenced on http://example-site.com/page1, http://example-site.com/page35, and http://example-site.com/page102
- http://badlinktwo.com is referenced on http://example-site.com/page1, http://example-site.com/page1023, http://example-site.com/page2439, http://example-site.com/page5192
Etc. |
Etc. |
Etc. |
One possible way to implement this is to change all the urls into some logical link in your CMS. Assuming your CMS stores straight HTML rather than a more structured format, then any url the user enters could be changed to a macro (if the user could put in a hard link directly into the HTML without the system changing it, even if there was an option for creating a logical link, most users would probably just skip the logic linking). For example if the user put in this HTML:
<a href="http://hobbsontech.com>Hobbs On Tech</a> then the system would replace it with
!link(123," _fcksavedurl="http://hobbsontech.com>Hobbs On Tech</a> then the system would replace it with
!link(123,"Hobbs On Tech")and put in its link repository that link 123 was
http://hobbsontech.com. When the page was generated then the correct link could be replaced in the HTML (so of course the end user's browser should never see the "123" in the HTML). If the page linked to was in your CMS, then the macro could be different and just indicate the unique key for the content item being pointed to (this would depend on whether the context that the content appeared in was relevant). For example:
!cms_item(123,"Hobbs On Tech") Related items that a link repository might help with:
- Reporting on content use. A link repository would allow other interesting reporting, such as the most linked-to content items in your repository.
- Easily move content. In some cases, it may be easier to move content if you had a link repository. For instance, you may sometimes need to restructure your site resulting in the links changing. With a link repository, you could automatically change all the links so that the move did not result in broken links (of course this would work best for intranet sites where there were limited links outside your control to your content).
Of course, this would add complexity (and possible failure points) to a CMS. Do you think it would be worth it?
Bookmark/Search this post with
Submitted by David Hobbs on 8 December 2007 - 4:06pm
You know when it's time to move into a new house or apartment, when you look at the stuff you need to move and think "Why in the world do I have this bread machine? I haven't used this in years and I forgot I even had it." Or you dread moving your old clunker of a TV, thinking of the new fancy flat-panel TVs? Well, it's the same thing with migrating to a new system, for instance into a new content management system. Only it's harder. When you're moving and you're pressed for time, you may just start tossing stuff into boxes to be moved, even when you know you don't totally want all the stuff (one reason: you'll need to negotiate with a spouse about getting rid of something, and there's no time for that). This isn't that big a deal, since it's just moving more of the same stuff. Or, if you have a huge sectional couch that won't fit in your new place, then perhaps you can just sell it to the next homeowner. When you're moving content, you have all sorts of extra things to think about including:
- It's not just content. Content on a site doesn't just live in some abstract ether, but it is linked into a larger site context. This includes left navigation, headers, footers, and special site behaviors. Of course moving the site context of a simple site like hobbsontech.com would be relatively easy to move (re-creating the the menus, configuring the overall style, etc), but the more sites you have, the more there would be to do. This is especially relevant for sites with a lot of custom dynamic functionality. For instance, if you have comments on your current site's content, then you'd have to figure out how to embed it in the new framework (or just leave it behind). Chances are you have a lot of functionality distributed throughout your site that may even be hard to inventory.
- Metadata and taxonomies. You may have to re-create taxonomies in another system, and there may be incompatabilities you have to work through.
- Internal references to other pieces of content. Your content probably refers to itself (for instance, a press release may refer to your product description page). This somehow has to be reflected in a new system.
- Structured content. You may have structured content (for instance, a document that has multiple chapters), which you'll need to figure out how to handle in the new system.
- Outside references to your content. Other sites, as well as search engines, will have links to your content. You'll need to have some strategy to deal with the links from external sites.
In the end, a lot of this has to do with the web of information that's involved in the content of a web site. And this isn't counting the types of technical issues that would come up with any technical migration (differences in size limits for fields, encoding differences, etc.). Of course there's the issue of why you even have all this stuff to move in the first place (and the more stuff you have the more hassle it is to move). This blog entry has focused on why it's difficult to move all this content, but of course one of the morals of the story is to have less stuff in the first place. In the case of the web this would involve better governance of what goes on the web, and clearly defining what the focus of your web site should be. Hopefully, just like when moving houses, any discussion of moving content would also include discussing what stuff you need in the first place. Unlike houses, having extra or duplicate stuff doesn't just inconvenience you but it is a disservice to your users. I'll leave the issue of the old TV and desiring the new flat panel to a future post (on survivorship bias).
Bookmark/Search this post with
Submitted by David Hobbs on 2 December 2007 - 10:52pm
Very large sites supporting a large number of units/stakeholders can easily turn into a hodge-podge of styles, user interface elements, and quality. One of the toughest discussions with clients, however, is why they can't do more customization (even if one of the core requirements of the system is to help enforce standardization). What are some of the reasons *not* to standardize:
- specific business needs of different groups (not to be confused with a group just wanting to differentiate itself somehow, for instance with a different look, that does not help the web visitor at all)
- professional development (for instance a developer might be interesting to do a mashup)
- personal expression (liking particular colors for example)
- experimentation (don't know in advance what's going to "stick," so try a variety of things)
In my opinion, the first and last reasons are the most compelling (and the third not being a good reason at all for an enterprise-wide system), although one of the problems with experimentation is the frequent expectation that an experiment could quickly be rolled into the normal standardized platform (that's probably a post on its own!). Here are some reasons *to* standardize:
- consistent brand for the user ("am I still on the same site? Is this high quality content?")
- consistent UI for the user ("do I know how to use the site?")
- better support for new site admins or transition of support between sites
- single sign-on. It's confusing for a user to have various accounts with the same institution.
- standard statistics. Different statistics packages can have entirely different ways of counting something as basic as a page view. Standardizing no a statistics package can help ensure you're comparing apples to apples in your web analysis.
- better search. If everyone does their own thing, then there may be more fragmented information which would mean search results aren't as good.
- stability / support. As anyone who works with software/systems knows, the more functionality or special customization you put into a system, the more effort it takes to maintain it. Also, the system will probably be less stable. This one is also very tough to discuss with a client (and another probable future blog post) since they tend to only see their particular need.
Some possible methods of standardization:
- Governance. There needs to be a group with the power and influence to say "no" to requests that undermine the quality of the user experience of the site at large. This ideally is not the technology group since there would appear to be a conflict of interest.
- Clearly define exactly what is inside the standard and what is outside.
- Technology. The content management system used to manage the site can be set up such that users can only make changes that comply with the standard.
- The right level of customization. Standardization shouldn't be an excuse to totally control every aspect of everyone's sites or to not allow any innovation.
- Hooks into core shared functionalty. You may decide that a single sign on for users of your site is desirable. If so, then perhaps the system could be set up with an API such that tools developed and commissioned by other groups could work with the core functionality.
- Standardized access to data. Ideally, you could define a standard method of each system exposing its core data, that even people outside the institution could utilize for mashups, etc. By providing the data in a simple XML API, this could facilitate both internal and external usage of data.
- Another potential approach is to have separate branding for the official, blessed content and for the organization-centric content. For instance, you may have multiple units in your institution all looking at the topic of taxes. Ideally you would have one official web site that makes sense of your institution's view of taxes overall, and preferably this would pull information from all the units. The various units still may want their own site, but this is less useful for the end user -- so perhaps these units could have their own sites branded differently (and perhaps all requiring a standard link back to the official site) to clearly indicate it is the view of a particular unit with your institution.
Of course, all of these are easier said than done when trying to get a large number of units into the same system, but perhaps some of these could be initiated even after a large suite of sites have been implemented in a central content management system.
Bookmark/Search this post with
Submitted by David Hobbs on 1 December 2007 - 11:40pm
Now that I've been using Drupal for a month (Drupal is driving this site), I thought it would be a good time to write up my impressions of this open source CMS. Obviously this is before I know the tool in depth, and also before I become jaded or a zealot. For reference, over the past seven years I've worked with a couple custom CMS systems, both driving very large and small web sites, and I also have used Wordpress (not a CMS) for two other personal blogs. Also, my emphasis using Drupal for this site has been as a blog so I haven't fully explored all the CMS features. I really like Drupal for many reasons including:
- Very easy to add features to a site. You download a module, set some parameters, and then you have a new feature (instructions). Examples of useful features that I've added easily to this site: feedburner redirects (Feedburner module), CAPTCHA checking for form submission to block robots (CAPTCHA module), email forwarding posts (Forward module), SEO and human-friendly urls (pathauto module), links to digg and other services (Service Links module), and full name listing as author for blog posts (Authorship module). That's not counting the really useful core modules that I've enabled for the site like the ability to pull in / aggregate other feeds (Aggregator module), comments (Comments module), automatically pinging services such as Technorati (Ping module), search (Search module), some web access statistics (Statistics module), the ability to create different taxonomies/categorizations (Taxonomy module), and the ability to automatically have features of the site turned off under high load (Throttle module).
- Built-in performance and throttling. You can set how aggressively you cache, and selectively set which features get turned off under higher load.
- Upcoming features. Since Drupal has gained some momentum, one feels the inevitability of new features being added as time goes on. Also, since it looks to be easy to add new modules, you could add a module yourself if you wanted.
- Nice modularization. I haven't read the documentation on how to develop modules, but, just seeing how modules work when installing them, the pluggability of modules seems very nice. Once installed/enabled, modules aren't just stovepipes. For example, when in admin mode the CAPTCHA module shows a message next to every form on your site asking if you want CAPTCHA there. Also, it appears that modules can easily write to the log screen, and are all controlled from the same core administration screen. When writing a post, the different options are embedded right on one page (for instance, the pathauto module automatically indicates, and lets you override, the alias it plans on giving your post).
- Flexible themes. I'm using the Garland theme, and I liked how you can set your own colors and other things are parameterized like your logo.
- Multiple taxonomies. You can create easily create your own category lists.
Perhaps above everything else, it just seems that the details have been done very nicely in Drupal. Overall a site in Drupal appears to be easy to administer. These are some items that weren't as smooth: a) didn't get the Backup module to work quickly enough (was faster just to use scripts to do it rather than get the module to work), b) still don't fully understand the file upload/download environment (especially for counting the downloads), c) by default you're in raw html editing (yes, you can install a web-based HTML editor, but the ones I tried so far don't seem very useful), and d) getting the transparent logo needed for the Garland theme working quickly. A note to people currently working in a large, complex enterprise content management environment: I highly recommend playing with Drupal to get the creative juices flowing and also to work in a less constrained environment. Also, it's nice to work from a clean slate on a new site (although I've already bumped into a place where I wish I had set up the site differently in the first place). But of course working in an enterprise environment has a host of other requirements that have to be dealt with (for example standardized look across sites, security, single sign on, integration with internal repositories, existing systems, and standardized administration).
Bookmark/Search this post with