Doug Finke recently published a module for doing Fuzzy Searching with PowerShell. I forked his project and added code to get valued score for the results, and created a PR to his project. He suggested I publish my changes on my own, so I did. I already had a module planned for my approximate string matching algorithms, so I combined the two into one module – Communary.PASM.
The module contains several different approximate string matching algorithms, as well as my modified version of the fuzzy search algorithm Doug was working on. I also include soundex as I think it fits nicely with the concept of doing fuzzy/approximate searching.
The main functions in the module are
- Select-FuzzyString
- Select-SoundexString
- Select-ApproximateString
All functions have built-in help with examples of use. Note that using the approximate string algorithms have a tendency to be slower than the other two methods.
Fuzzy Search
The default behavior of Select-FuzzyString is to iterate through an array of strings and only let through the strings that matches the fuzzy filter. This functions have the addition of the Score system, but the results are not ordered according to this for performance reasons.
If you want to order by the score you need to do this explicitly by using Sort-Object, like this:
Soundex Search
Performing soundex search is similar to doing fuzzy search, but the method if quite different. Soundex really only works with English (sounding) words and names – something to keep in mind.
In this example we are using it to search through a list of names for all names that are similar to the name John.
Approximate String Matching
Using approximate string matching algorithms, while slower than fuzzy search, will often give fewer results, and the results tend to be more accurate. If we perform the same search query as we did for the fuzzy search example, the results are as follows:
You can specify the algorithm you want to use with the Algorithm parameter. All the different algorithms are also available for use independently if you need it.
You can also specify the tolerance used, where Strong is the most strict and Weak are the most loose, by using the Tolerance parameter.
The module is published to the PowerShell Gallery, so you can easily install it using Install-Module. If you need to manually install it, you can download it from my GitHub repository.