More on auto linking key phrases in blog posts

Following on from my previous post I had some more time to think about how to implement automatic linking of key words and phrases within blog posts.

The comments on my previous post were very helpful and gave two alternatives as to when the links should be applied. My first step was to come up with some configuration to drive the linking. I came up with the following JSON.


{
"linkattribs": {
"target": "blank",
"class": "test",
"onClick": "return clickHandler();"
},
"items": [
{
"phrases": [
"Umbraco 4", "Umbraco", "Boost"
],
"link": "http://www.umbraco.org/",
"enabled": true
},
{
"phrases": [
"Me", "Darren", "Darren Ferguson"
],
"link": "",
"enabled": true
},
{
"phrases": [
"Liverpool", "The reds"
],
"link": "http://www.liverpoolfc.tv/",
"enabled": true,
"linkattribs": {
"onMouseOver": "liverpoolLinkHandler();"
}
},
{
"phrases": [
"TinyMCE"
],
"link": "http://tinymce.moxiecode.com/",
"enabled": true
},
{
"phrases": [
"livewriter", "Live writer"
],
"link": "http://windowslivewriter.spaces.live.com/",
"enabled": true
},
{
"phrases": [
"livewriter", "Lucene"
],
"link": "http://en.wikipedia.org/wiki/Lucene",
"enabled": true
}
]

}

My next step was to quickly create some Perl code to apply the links to some markup (proof of concept before converting to C#).


use strict;
use utf8;

use JSON;
use File::Slurp;
use Data::Dumper;

my $post = read_file('post.txt') ;
my $config = read_file('config.json');

$config = from_json($config);

foreach my $item (@{$config->{items}}) {

if($item->{enabled}) {

my $link = $item->{link};
my $tag = build_link_tag($item->{link});

foreach my $phrase (@{$item->{phrases}}) {
$phrase =~ s| |\\s\+|gsm;
$post =~ s|\s($phrase)\s| $tag$1</a> |igsm;
}
}
}

print $post;

sub build_link_tag {

my $link = shift;
return qq(<a href="$link">);
}

Finally, the (untested) C# version.


using System;
using System.Web;
using System.Configuration;
using System.IO;
using System.Net.Json;
using System.Text.RegularExpressions;

namespace FergusonMoriyama.ContentLinker
{
public class Linker
{
public static String parseContent(String post) {

String umbPath = ConfigurationManager.AppSettings["umbracoPath"];
String configFile = HttpContext.Current.Server.MapPath(umbPath);

configFile = Path.Combine(configFile, @"\plugins\FergusonMoriyama\ContentLinker\config.json");
StreamReader reader = File.OpenText(configFile);
String configJson = reader.ReadToEnd();
reader.Close();

JsonTextParser parser = new JsonTextParser();
JsonObject configObj = parser.Parse(configJson);
JsonObjectCollection config = (JsonObjectCollection)configObj;

JsonArrayCollection items = (JsonArrayCollection)config["items"];

foreach (JsonObjectCollection item in items)
{
bool enabled = (bool)item["enabled"].GetValue();
if (enabled)
{
String link = (String)item["link"].GetValue();
String tag = "<a href=\"" + link + "\">";
JsonArrayCollection phrases = (JsonArrayCollection)item["phrases"];
foreach (JsonStringValue phrase in phrases)
{
String p = phrase.Value;
p = p.Replace(" ", @"\s\+");

// Console.WriteLine(p);
post = Regex.Replace(post, @"\s(" + p + @")\s", " " + tag + "$1</a> ");

}
}
}

return post;

}
}
}

This needs to be wrapped up as an Umbraco XSLT extension or get called as some sort of TinyMCE plug-in. I'll get around to this at some point soon.

I added the ability to add attributes to links in the config but haven't implemented these in the code just yet. Any further comments and suggestions would be much appreciated. When I'm happy I'll make an Umbraco package and make it available for everyone.

Finally, one thing I noticed when testing this is that your links obviously get added in the order that they appear in configuration. If you have a phrase 'Umbraco' and 'Umbraco 4' in your configuration 'Umbraco 4' would need to come first otherwise it would never be matched.

Comments

Leave a comment