requirements

Automatic Pull of Content: Some Issues

atlas-district-news

It seems so simple. You've got press releases that are clearly tagged to neighborhood (let's say the two possible neighborhoods are Capitol Hill and Atlas District). The Atlas District page should obviously only have Atlas District news, so you create a a section on the Atlas District page that lists the most recent three press releases there. Your web developer whips something like this up quickly (examples from the excellent local blog Frozen Tropics):

Possible Issues

Seems easy enough, right? Sometimes the straightforward approach may be fine (especially for small sites), but you could wind up with something more like this if you're not careful:

atlas-district-news-bad

Here are some of the potential issues with larger sites:

Drafts and embargoed material

"this should not appear anywhere, in any channel, until published"

Let's say you're about to post a press release containing the menu for a new restaurant in the Atlas District, and you've agreed to post it after 7pm tonight. You'll be working on a draft beforehand so that it's ready to go at 7:00. Obviously, the press release shouldn't appear until after approved time. This is more significant an issue than it appears, since if you start exposing APIs and other means of sharing your content, the same rules should apply there (rather than developers recreating the rules, and potentially introducing errors, every time).

Editorial decisions

"yeah, but I don't want it on my page"

A press release is published that is related to both the Atlas District as well as Capitol Hill. Perhaps it's about a bicycle race that will result in street closings in Capitol Hill but only parking in the Atlas District. The owner of the Atlas District page doesn't think it's significant enough to appear on the Atlas District page. This would be a case where the tagging to Atlas District is correct, but there is a valid editorial decision to not include it on the Atlas District page (perhaps there's another separate event there that should be in the top three). In this case, the press release should not be retagged to remove Atlas District, since for some purposes (such as enterprise search) you will want the correct tag.

Bad Tagging

"this tag is just wrong"

This one is virtually impossible to avoid when dealing with a large group of people submitting content (although see a related metator discussion about ways to improve this). Let's say that a new person who does not know DC very well arrives, and mistakenly tags something to Capitol Hill instead of the Atlas District (perhaps mixing up 401 H St NE and 401 H St SE). Note that this is very different than the editorial decision issue, although at first blush they seem similar. In this case, the tagging is wrong and should be corrected (or, in the case of automated tagging, the rules should be changed).

Multilingual Issues

"don't show me partial results in another language"

A variety of issues can occur when pulling content in many languages, especially when, as is usually the case, different pieces of content are in different languages. You can end up with too little new content (if you are displaying a page with too little content in that language), or with unnecessary duplicate content (see Interleaving Languages).

Broadcasted content

"I need this important information on all pages of the site"

If you have a lot of publishers and content, you may sometimes have content that should appear in all pages (broadcasts), regardless of what neighborhood the news is about (let's say a press release about Washington, DC overall and not specific to a neighborhood). What you *don't* want to do (but may indeed do in a crisis if this wasn't planned for) is tag content to all neighborhoods, for example, to have content appear there although it is not correct to tag it so.

Appearance of Timeliness

"a year old press release isn't 'current news'"

If you end up with a lot of automated pages (for instance if you cover 30 different neighborhoods), then it's easy to wind up with the block that says "Current News" that has very old content. In addition, if you are displaying events then events that are far in the future could overwhelm an event happening tomorrow.

What to do about it?

In future blog posts, I hope to cover approaches to avoid these issues, but in closing I thought it would be helpful to list some high-level pointers:

  • Clearly articulate how you want your automatic pulls should work, as early in your process as possible.
  • Don't think of each block in isolation, but try to implement things in a consistent manner (for instance, by only having page blocks behave in a few different ways)
  • Similarly, consider whether developers should have control over all aspects of each block, or whether much of the aggregation should only be available through a consistent API
  • Be mindful of the issues above when designing your page/block behavior and training of those that will be tagging.

As always, please provide any comments at HobbsOnTech or on Twitter at @jdavidhobbs.

"Just like current system"

We all encounter users with requirements like this: "This is easy. A college intern set it up in 10 minutes.  We just need you to put that functionality in your fancy-pants system."  (Well, maybe not that exact wording).

I encountered something like this a while ago. The basic requirement was to generate a little web report in a table. We implemented this in our system, and the user immediately called upset that it wasn't working -- it turned out they copied and pasted from the web page to Excel, and this was no longer working as they expected.

Now I know to ask more questions (and give this dangerous flavor of requirement its proper respect). Some questions to pose:

  • do all the users of this system/functionality use the same system/platform (if I watch how you use the system, is that sufficient)?
  • what other systems will use the output of this system?
  • how critical is this system? are you willing to live with some hiccups?
  • are you tied to the exact look and feel of the current system?
  • how do you currently manage your site, and the content and other data on the site?

In a lot of ways, this is a much scarier requirement than a brand new one. Even if there are misunderstandings in the requirements of a new system, at least everyone understands that this can happen. But a quick ctrl-a ctrl-c ctr-v can be all it takes for a user to prove that your implementation doesn't do what the previous one did.

You should try to sit down and walk through the user actually using the functionality. If at all possible, you should also walk through the system yourself, noting any potential complexities (and discussing this potential complexities with your client). Emphasize that there may be details of the current system that they and we weren't aware of. Hopefully you can also concentrate on *improvements* that can be made to the system, so they aren't as dissappointed with small setbacks.

As to phasing, if possible, try to first deliver a pilot or beta so they can play with it. Also, you need to be ready to "fix" things after delivery.

Interleaving languages

Many of your users may speak multiple languages. If so, you may want to implement lists that prefer one language, but show another language if the content is not available in the preferred language.Let me say right off that you may decide not to implement this functionality in a system. That said, this is a requirement that, if not built into the system from the start, may never happen. So you'll want to think this through. If you decide to implement this, make sure not to lock yourself in to a system that will not allow it (and be careful assuming that you could rewrite/extend things later to support this).Let's start with three press releases, one of which is in two languages:

  • "David goes to market" (in English) and "Dawud mache souk" (the same press release, also about David going to the market, in phonetic Chadian Arabic)
  • "Dawud nisid kalam Arab" (a press release only in Chadian Arabic)
  • "David looking for better examples" (only in English)

The English-only list of press releases ("David goes to market" and "David looking for better examples") and Chadian Arabic only ("Dawud mache souk" and "Dawud nisid kalam Arab") lists are easy and obvious. But what about the two press releases that are just in one language? If I'm on a site in Chadian Arabic, then shouldn't the English-only press release be presented (or at least the *option* of listing the English)? That way, you at a minimum indicate that there are more relevant press releases, and, in the best case, the person might be able to figure out English so they get the information. Of course, this may only make sense when the proportion of English content doesn't totally swamp the Chadian Arabic (so for instance this may be relevant on a Chad-specific site). So the prefer Chadian Arabic, but show in English if not available in Chadian Arabic list would look like this:

  • Dawud mache souk
  • Dawud nisid kalam Arab
  • David looking for better examples (English)

If you don't decide up front which that you need this type of list (or implement an interim solution and never implement the above), you'll probably end up with something like this if you later decide you want both languages (note that the first press release is presented duplicated, being presented once in English and once in Chadian Arabic):

  • David goes to market (English)
  • Dawud mache souk
  • Dawud nisid kalam Arab
  • David looking for better examples (English)

 This wouldn't be the end of the world, but not ideal.From a technical perspective, the ideal interleaving would probably involve some sort of parent/child relationship (for instance perhaps there's an abstract, language-less object that's about David going to market, with two children, one tagged as English with the title "David goes to market" and another tagged as Chadian Arabic with the title "Dawud mache souk"). The more obvious implementation, that would normally only support the duplicated list, would be to just have a mass of objects that indicate language and title but have no parent/child relationship. Of course, these could be related to each other (with some sort of "translation-of" property for example), but it would then be computationally complex to get the non-duplicated list as-is (without adding some sort of sophisticated index or somesuch).Update (November 14): Another possible technical implementation would be one piece of content that has blocks within it for different translations (for instance, the first press release above is *one* piece of content but with two translation blocks in it).