Track Changes, Compare text and highlight difference, Text Diff

Intro

A great deal of my database works revolves around tracking text, whether it be requirement specifications, comments, test procedures or hazard logs the requirement is always there for being able to track changes. Who created the piece of text, when was it changed and by whom?

I have always stored the full logs of this information, but at the record level. While developing a hazard log, the requirements were that they wanted “word like” track changes, i.e. track changes down to the individual word being replaced. This article is about the result, a function that will compare two textstrings and output the result in rich html, with markups for added and deleted text. See example below:

ExampleLinkBroken

Example of output

Theory

I spent alot of time trying to research this matter without finding anything, but the general principle regarding “Longest Common Subsequence” outlined in

Wikipedia: Longest Common Subsequence

Its one of those things that can be quite complex to grasp, (certainly took me a while) but at the same time is amazingly simple once you realize how its working. To better try and explain the theory behind it, I have prepared this youtube video showing the details. You don’t NEED to watch this, nor understand it, in order to be able to use the function. This is merely for those interested in understanding the concept.

Thankfully I was able to find someone who had already done some of the hard work, in writing some of the required functions. Travis Hydzik has a blog, which is certainly worth a read, and his blog helped getting me started on this project. With his functions as basis, I moved on to add the color coding of the output, as well as optional parameters such as being able to split the target string into words instead of simple characters. This seems to give the best (most readable) output in most cases, and only comparing words instead of characters also gives a significant speed increase. I have optimized the function so that before doing LCS comparisons (LCS is resource intensive) it will check for simple equality of the beginning and end of strings. This means if only a few edits are made, the LCS can run orders of magnitude faster.

Demo

This is one of the things I use it for. Providing a detailed overview of the history of a requirement, with nice details on what changed. See screenshot below:

HistoryReport

Example of report showing requirement history.

Instructions

A bit further down you will find the download. The download also contains some demo, and proof of concept, but the only thing you NEED is to import the module modTSC_Compare. You can try to learn from the examples provided in the download (recommended), or read here.

The function is called TSCfn_CompareAndDiff and the only required parameters are the old text and the new text you wish to compare. TSCfn_CompareAndDiff will return a rich text formatted string. To display this properly you need to use a rich text control, a textbox (Access 2007+). Note that if you store the output in a table, the FIELD must be set to plain text. It is the CONTROL used to display the information that must be rich text. You can use the function like any other custom function, that is you can use it directly in queries, in your own vba code, or in the controlsource of a textbox. The function also takes optional arguments allowing you to do the compare case sensitive or by character instead of by word.

Download

Download “TheSmileyCoder-Compare-and-highlight.zip” TheSmileyCoder-Compare-and-highlight.zip – Downloaded 951 times – 244 KB

 

 

 

 

 

 

Posted in Misc., Random Tidbits
8 comments on “Track Changes, Compare text and highlight difference, Text Diff
  1. Crystal says:

    Wow, very impressive! I would be interested in seeing another video showing how the array comparisons work when the number of words are different. Thanks

  2. Pratik says:

    Thanks a ton for sharing the code – you solved one critical task for me Mate – i really appreciate your help on this.

  3. my name us ammu says:

    thanks

  4. jonas says:

    Is it possible to just show the text that is added? (don’t shiow deleted text)

    • TheSmileyCoder says:

      Yes, that should be possible. I don’t have a switch or parameter in the code however to just specify it.

  5. David Keeble says:

    CAn this formula be used in excel to compare two text columns?

    • TheSmileyCoder says:

      The output from the function is HTML in the syntax that Access uses for rich text fields. So while the function might work in Excel, Excel would not be able to display the result, as excel uses a different formatting for rich text. The core concept behind it (LCS) would of course be possible to apply, but I don’t have anything made for formatting the output in excel.

Leave a Reply

Your email address will not be published. Required fields are marked *

*