Track Changes, Compare text and highlight difference, Text Diff
A great deal of my database works revolves around tracking text, whether it be requirement specifications, comments, test procedures or hazard logs the requirement is always there for being able to track changes. Who created the piece of text, when was it changed and by whom?
I have always stored the full logs of this information, but at the record level. While developing a hazard log, the requirements were that they wanted “word like” track changes, i.e. track changes down to the individual word being replaced. This article is about the result, a function that will compare two textstrings and output the result in rich html, with markups for added and deleted text. See example below:
I spent alot of time trying to research this matter without finding anything, but the general principle regarding “Longest Common Subsequence” outlined in
Its one of those things that can be quite complex to grasp, (certainly took me a while) but at the same time is amazingly simple once you realize how its working. To better try and explain the theory behind it, I have prepared this youtube video showing the details. You don’t NEED to watch this, nor understand it, in order to be able to use the function. This is merely for those interested in understanding the concept.
Thankfully I was able to find someone who had already done some of the hard work, in writing some of the required functions. Travis Hydzik has a blog, which is certainly worth a read, and his blog helped getting me started on this project. With his functions as basis, I moved on to add the color coding of the output, as well as optional parameters such as being able to split the target string into words instead of simple characters. This seems to give the best (most readable) output in most cases, and only comparing words instead of characters also gives a significant speed increase. I have optimized the function so that before doing LCS comparisons (LCS is resource intensive) it will check for simple equality of the beginning and end of strings. This means if only a few edits are made, the LCS can run orders of magnitude faster.
This is one of the things I use it for. Providing a detailed overview of the history of a requirement, with nice details on what changed. See screenshot below:
A bit further down you will find the download. The download also contains some demo, and proof of concept, but the only thing you NEED is to import the module modTSC_Compare. You can try to learn from the examples provided in the download (recommended), or read here.
The function is called TSCfn_CompareAndDiff and the only required parameters are the old text and the new text you wish to compare. TSCfn_CompareAndDiff will return a rich text formatted string. To display this properly you need to use a rich text control, a textbox (Access 2007+). Note that if you store the output in a table, the FIELD must be set to plain text. It is the CONTROL used to display the information that must be rich text. You can use the function like any other custom function, that is you can use it directly in queries, in your own vba code, or in the controlsource of a textbox. The function also takes optional arguments allowing you to do the compare case sensitive or by character instead of by word.