The older your site, the more layers of content you have. This is similar to the layers of rock you see when driving through the mountains that have been blasted through for the highway. Some layers may be harder and others softer. On the editorial side, perhaps you had different writing style, editorial focus, editorial standards and general quality under different editors. On the technical side, perhaps ten years ago you were using tables for your formatting, then one division started using Flash extensively, and another group was frustrated by the controls in the CMS so used javascript to rewrite the pages. This information is probably important for your content inventory.
Some of the reasons to capture this:
Cutting content, either as a one-time or ongoing basis. The strata of content may be a key factor in your decisions.
Website transformation or migration. Any time your content needs to change to enable a transformation on your website, you need to understand what the transformation needs will be, and the different layers of content probably need to be transformed differently. To use a simple example, if a whole site section that was created five years ago is heavily Flash-based, then the transformation there may need to beg entirely different than for another layer.
Generally determining the quality. You may want to analyze your quality over time, perhaps even testing the impact of different approaches to content over time.
Before continuing on to ways of figuring out the layers, I wanted to point out why layers in particular is important rather than why you might just do counts of various aspects you want to test (these 10% have Flash, these 25% use tables for layout, etc). The primary reason is that it allows you to better look for patterns (for example, all the pages in this section use Flash and tables). Another reason to group by layers is that these are probably a way that internal stakeholders can wrap their heads around and make solid decisions.
Also, notice in the graphic above that if you look at the side you see slightly different layers than when you look from the front. So naturally what layers you see also depend on how you slice through your site. You may miss some important layers.
Figuring out your layers will depend on the specifics of your web presence, but here are some ways of figuring out layers:
Age. This one is relatively easy to get (although getting the originally-posted date can be a challenge), and, if you're going to slice on just one metric then this is a good one to start with.
Editor. Different editors may have dictated different content quality and focus.
Systems. From a technical perspecitive, the system (or original system if it's already been migrated once) can have a huge impact on the underlying technical content quality.
Content inventories are often considered just long lists of content. In fact, the top Google.com search result on "content inventory" is still the 2002 Adaptive Path article calling them "a mind-numbingly detailed odyssey"). But sites now are often getting way too complex (and big) to plod through every entry.
Many web presences have multiple sites or perhaps subsites or major sections. A multinational consumer product company has sites per country and / or product. Advocacy organizations may have a site per initiative, and news sites will also be broken down into primary sections like Sports.
So instead of a mind-numbing list, your content inventory could be grouped to come up with site inventories like this:
Subsite
Pages
New Template
Popularity
Percentage of pages using most recent corporate-approved Dreamweaver template
Percentage of pages that have received less than five pageviews in the last month
Sports
5000
100%
50%
Celebrity
1000
90%
50%
Europe
1000
90%
90%
World
2000
10%
90%
Politics
3000
50%
50%
Weather
6000
100%
10%
You'll notice that this type of report tells you a lot of information that probably would not be obvious when poking around a laundry list content inventory. For example, you see:
Which sites have a high percentage of unpopular content that are also small and on an old template -- potentially the entire subsite could be dropped for example
Which sites are completely using your newest template -- these could potentially be the first to migrate in a migration project
What sites are large and in the new template but with a lot of unpopular content -- these may be ripe for a new publishing strategy
Part of the point of the site inventory is that it is combining information from multiple sources (the above example table lists some possibilities but there are many more). Obviously, you could look at your analytics to just see the total page views for a site, or even the % of pages on a site that have under a threshold of pageviews per month. But it gets much more interesting when you combine the information, especially when you are considering phasing changes to your site. For example, you could migrate all sites that are 100% in the new template first, then in the next phase move those that have 90%+ in the new template.
A site inventory view into your content inventory of course isn't the only view you need to take. You need to look specifically at high-value content, and may need to slice and dice the results in different ways (such as by content type). But for complex rollout planning or other broad analysis, especially for very large sites which an organization doesn't have a solid handle on, site inventories can help drive decisions.
Note that the site inventory can just be derived from a larger content inventory. For example, if your content inventory has less than a million items, then you can use Excel pivot to aggregate the information (and other tools could be used for larger sets).
Many sites are largely organized around topics (as oppossed to, say, the org chart or products). But there are plenty of nuances to having successful topics pages, from dealing with political issues (when should a new topic page be created) to functionality options (how should topics pages be managed) to the metadata concerns. If creating automated listings on topics pages, consider this:
You can only have automatically-pulled topics listings if there is a one-to-one or many-to-one relationship from your source metadata tags to your target groupings.
This probably sounds a bit too academic, so let's look at this in concrete terms. Let's consider the case where you have a bunch of articles tagged to either broccoli, apple, or orange. If you wanted to have a vegetable topic page and a fruit topic page, then this would work fine. This is because you have a one-to-one mapping from broccoli to vegetable (broccoli is always a vegetable), and a many-to-one mapping of apple and orange to fruit (both apple and orange always map to fruit and nothing else). That said, you cannot have red and green topic pages based on the existing tagging. Broccoli is always green, so if the only tags you had were to broccoli then you would be fine. But the apple tagging is the problem: an apple can be either green or red. Obviously, you could introduce a new color tag, but if you had a large number of existing pieces of content then you would have a large amount of retagging to do.
This may be obvious when looking at three tags and a handful of possible topics pages. But when looking at larger repositories and the possibility of creating topic pages with some automated pulls, consider the mappings to ensure you end up with relevant topics pages.
--------------------------
Need help setting up or fixing the processes, functionality, metadata, or other aspects of creating topics pages on your site? Contact David Hobbs Consulting.
Does manually migrating into a CMS help train users? It can, but it isn't great training on it's own and in some ways can be detrimental from a training perspective as well (in particular in forming a positive impression of the tool). Note that this was explicitly left out of my previous post on making a decision to migrate manually or automatically since the training aspect really is not a slam-dunk positive in my mind. Let's start with the cons of migrating as training.
Much of the migration effort, especially for a large migration, will not be done by the same people who will be using the system on an ongoing basis (so you're not training the right people anyway)
The problems of migrating are different than day-to-day, ongoing efforts
May set the expectation of very wide / distributed content entry, when on an ongoing basis a more centralized team may be better
Tool probably not ready, so users will probably get frustrated (not an ideal training situation)
What are some other disadvantages?
May not have the opportunity to train on subtle but important aspects for long term success (for instance, strong metadata tagging) -- another way of looking at this is that everyone, even those with good intentions, will have tunnel vision on the tips / tricks to ram in as much content as quickly as possible
Loss of enthusiasm for the tool based on massive repetition (even if tool is strong)
Little training on creating content from scratch, which may be a more common use case (rather than, for example, cutting and pasting)
Pros
It's not all dark and gloomy, and you certainly want to optimize the training aspects of whatever manual migration does occur. So here are some positives to training during a manual migration:
Practice cutting-and-pasting content -- depending on the environment, this may be either a common or uncommon situation on an ongoing basis
Practice creating navigation
Quick and early feedback on user interface issues
Quick and early feedback on your training program (your training documentation, processes, help screens, etc)
Ensure people have the right access to their websites earlier in the process
Will manual migration be good training for you?
If you have a small site, and you won't need to bulk up with a large number of external, temporary folks to do your manual migration, then migration will probably be a good component of your training. If your site will be able to use your CMS content contribution interface as-is, then there's more liklihood that you won't alienate people early in the process. The equation for large sites is much different, since the cons listed above will be more of a factor. That said, by all means take advantage of the training opportunity of the migration if you decide to do so manually. I just wouldn't overstate the training advantages in that case.
What are your thoughts? What pros / cons do you see? What are your experiences?
Automation is a worthy goal, and I'm always looking for ways to automate migration where possible. That said, obviously there's a tradeoff between automation and manual migration. For instance, if you have a site of ten pages then don't even waste your time talking about automation and buckle down to copy and paste into the new system (or just create the new content from scratch). At the same time, if you have a site with 500,000 pages that you want to keep, then you probably want to spend a lot of time talking about automation. So how do you know whether to pack up the pickup truck and move yourself, and when is it time to try a more sophisticated approach? The following should help in your decision.
Evaluating whether to automate
What are some of the factors in deciding whether to automate or not?
Commonality
How consistent is the content you will be moving in? This one can be easy to ask but difficult to answer. Much of the discussion of a migration is looking for patterns, so this isn't literally about whether 80% of the content on a current site is driven by the same template (although that helps). For example, if your current site is not in a CMS but still every page consistently uses H1, H2, strong, and em tags then you may be able to scrape out the information you need.
Structure
The structure of the content / pages on the source and target system are also crucial, since this will determine how much transformation is required (and, notably, whether people will need to edit to get there). In the example above with the content that has common usage of H1, H2, and some other tags, that commonality is useful only if it maps usefully to the target structure. For instance, if the target system is highly structured but the source system is not, even if it's common / consistent then it may not help you much in automating your migration.
Editing requirements
When considering the vision of your site after migration, you may necessarily need to modify a swath of content purely for editorial reasons. Obviously, this is a big argument for manual work, although it may just mean a modification of the process such that initial migration is done in an automated fashion but regular editorial work is done after initial technical migration (or other process changes may be needed).
Staffing
If you have a large and distributed publishing community, then even a large number of pages may be able to moved fairly quickly. That said, this does necessitate a fairly polished publishing system earlier in the process.
Raw count
Obviously, the more content you have the more likely automation will make sense. But there's one very important nuance here: it's the count of *similar* content that matters. In other words, if you have 10,000 pages but 100 groups is each managing completing different sets of 100 pages each, then it may make sense for each group to manually move their content!
Advantages of automation
Iteration
Almost by definition, automation means repeating runs of migration until you work out the kinks in your automation rules. This is completely different with the manual process, where people are much less likely to go through all the content again to make an improvement only realized later in the process.
Cost savings with consistent content
This is probably the key reason organizations look to automation. With the right level of consistency, source and target structure, and number of content items to be moved, automation can bring cost savings.
Consistency across large amount of content
Related to the point on iteration above, it's easier to have a consistent level of quality across a large amount of content, since you do not have to train a large number of users that might be treating content differently.
More likely to see patterns that can be applied on an ongoing basis
The most interesting aspect of a migration is searching for patterns, and many of these patterns can be used on an ongoing basis.
Less dependency on the publishing process being perfected
One of the key aspects of CMS acceptance is the publishing experience for content providers and site owners. With automated migration, this publishing process does not need to be as perfected before migration starts.
Advantages of manual
DIY
You can start as soon as editing tools are in place. Also, there's no need to engage with the technical team.
Budgeting
It may be easier to get the budgeting in place for manual migration than for an automated migration project, especially if you already have access to a pool of content contributors that could work on the project.
Built in editing
Especially for a smaller set of content, you can ensure that human editing of the existing content occurs during the migration (although for larger migrations getting consistent quality may be difficult).
Built in QA
If someone is staring at the content during migration, then nominally they have a chance to QA whether the migrated content looks good. Note that this means that the output templates need to be complete for a solid QA during migration. Also, this works best when the same people who are migrating the content also own the content, which may not be true.
---------------------------
Want a worksheet to help in your decision of manual vs. automated? Subscribe to Ten Weeks to a Better Migration for weekly action-oritented information and excercises to improve your migration effort.