NClassifier: Auto tag posts

A while back I read Ismail's post on NClassifier. I've always been interested in content intelligence and decided to look into auto tagging blog posts and creating document summaries.

I'm not new to this game and I still feel that humans do a much better job of summarising text than machines, however the tagging using the BayesianClassifier works quite well.

I've put together a very simple demo which classifies blog posts based on this simple xml configuration. If you click through to the detail view of any blog post, you'll see the result of the classification just before the comments.

Ultimately, I'd probably implement this as an Umbraco actionhandler which sets metadata against items as they are published, but for now classification runs on each request.

If you are wondering what the purpose of this is, think about the technology behind Amazon recommendations and numerous sites that implement 'you may be interested in this' functionality. Once your content is classified, it isn't too hard to track which tags that your users are looking at most and assemble custom content for them.

Leave a comment