Fuzzy Search with PowerShell

Doug Finke recently published a module for doing Fuzzy Searching with PowerShell. I forked his project and added code to get valued score for the results, and created a PR to his project. He suggested I publish my changes on my own, so I did. I already had a module planned for my approximate string matching algorithms, so I combined the two into one module – Communary.PASM.

The module contains several different approximate string matching algorithms, as well as my modified version of the fuzzy search algorithm Doug was working on. I also include soundex as I think it fits nicely with the concept of doing fuzzy/approximate searching.

The main functions in the module are

  • Select-FuzzyString
  • Select-SoundexString
  • Select-ApproximateString

All functions have built-in help with examples of use. Note that using the approximate string algorithms have a tendency to be slower than the other two methods.

Fuzzy Search

The default behavior of Select-FuzzyString is to iterate through an array of strings and only let through the strings that matches the fuzzy filter. This functions have the addition of the Score system, but the results are not ordered according to this for performance reasons.

PASM001

If you want to order by the score you need to do this explicitly by using Sort-Object, like this:

PASM002

Soundex Search

Performing soundex search is similar to doing fuzzy search, but the method if quite different. Soundex really only works with English (sounding) words and names – something to keep in mind.

In this example we are using it to search through a list of names for all names that are similar to the name John.

PASM003

Approximate String Matching

Using approximate string matching algorithms, while slower than fuzzy search, will often give fewer results, and the results tend to be more accurate. If we perform the same search query as we did for the fuzzy search example, the results are as follows:

PASM004

You can specify the algorithm you want to use with the Algorithm parameter. All the different algorithms are also available for use independently if you need it.

You can also specify the tolerance used, where Strong is the most strict and Weak are the most loose, by using the Tolerance parameter.

The module is published to the PowerShell Gallery, so you can easily install it using Install-Module. If you need to manually install it, you can download it from my GitHub repository.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s