The quest for speed – Finding files

A while back I wrote a post about handling files in PowerShell. In it I describe the different ways you can read and write files. You can read it here if you like: Working with files in PowerShell. This time I will discuss how to find files. You have undoubtedly been in a situation where you need to locate a file on your disk, but didn’t know exactly where to look. Or perhaps you need to get all log-files in a share? Or document how many mp3 files you find on the server hosting all your users profiles?

When you are searching through several thousand files using a specific pattern, one thing you will notice right away, is that speed is of importance. But before we look into that, let look at the command you would use to search for files.

Get-ChildItem c:\*.txt -Recurse -Force

The command we use are of course Get-ChildItem, which you might know better by one (or both) of its more known aliases; dir and ls. When called it will get information about one or more files. It takes a path as one of its parameters, and this parameter supports wildcards as you are used to from similar tools. If you want to search recursively, you use the Recurse parameter. The Force parameter lets you get files and folders that are normally not accessible to a user, such as hidden or system files.

So far, so good. But if you where to run the above command, depending on how many files you have on your c-drive, you will probably be staring at the screen for a little while before it finishes. Not a big deal perhaps, but consider using it to search for files on a file server with ten, or perhaps even hundreds of thousands of files? The minutes quickly add up.

On my personal computer, running the above command, took about 118 seconds (1.9 minutes) to find roughly 4200 txt-files scattered around my c-drive.

I’m not happy about that. Let’s investigate and see if we can’t do this any faster. When looking at the help information for Get-ChildItem we can’t help but notice that it supports several other parameters that looks interesting. For instance it has a Filter parameter, that according to the help text is more efficient. Let’s try:

Get-ChildItem c:\* -Filter *.txt -Recurse -Force

I have modified the command to take advantage of the Filter parameter for the pattern matching, but otherwise everything else stays the same. Running the command again, I get my 4200 txt-files back after about 29 seconds! Wow, that’s quite a difference.

Lets explore different ways of running the same query to see if we can do any better (or worse). You won’t find it in the help information, but Get-ChildItem also supports a File parameter (as well as a Directory parameter). With this you can specify that it’s files you want returned. Sounds like something that might speed up our command a little more, right? We don’t care about folders, we know we are looking for files only. Let’s give it a try:

Get-ChildItem c:\* -File -Filter *.txt -Recurse -Force

Surprisingly, running this took just as long. No speed improvement at all. My guess is it’s getting all the same information, just filtering the output after retrieving it.

Another thing I want to try is to replace the Filter parameter with Include. This parameter lets you define what you want included with your results. According to the help, the command will only get the specified items, so it should give us pretty much the same performance as using the Filter parameter, right?

Get-ChildItem c:\* -Include *.txt -Recurse -Force

Wrong. Running the command now takes about 114 seconds, so just barely better than our original command. So the lesson here is that if you don’t absolutely need it, skip the Include parameter, and use Filter. Unless I am using it wrong, in that case, please let me know!

But lets explore further. There are other ways of running this query. Lets try this one:

Get-ChildItem c:\* -Recurse -Force | Where-Object {$_.Name -like '*.txt'}

This time it took 104 seconds. So this method is not as good as using the Filter parameter, but better than using Include, or using wildcards in the Path parameter as we did the first time.

But wait, lets try something totally different. What if we tried using the old DOS version of dir instead? This is of course not the “proper” PowerShell way, but after all, it’s speed we are after here.

& 'cmd.exe' '/C dir /s /a /B c:\*.txt'

Here I’m calling cmd.exe and sending it the dir command. If you wonder about the parameters, /s means recursive search, /a means return all files (also hidden and system), and /B means we just want the name of the file returned. This command took 9 seconds to return our files! That’s 20 seconds faster than our fastest PowerShell version! Imagine the difference that would make on a share with one hundred thousand files.

I think I hear your thoughts already… that this only gives us the name (and path) of the files found, while PowerShell gives us proper objects back – objects that we can reuse for other stuff. No wonder there is some overhead, right?

I created a wrapper function for good-old DOS dir, that we can use to perform the same kind of search that we did above, but with one added benefit. The resulting filenames are piped through Get-Item, which mean that we DO get our pretty objects in return. Lets give it a go shall we?

Invoke-DosDir c:\*.txt -Recurse -Hidden -File

This time it took 10 seconds for the command to finish. So we are talking of a small overhead of about a second, but in return we get proper FileSystemInfo objects, that we can further work with.

So there is certainly something to be said for exploring your options if speed is of importance. I’m very curios if anyone have any other optimization tips related to finding files using PowerShell. If you do, please let me know in the comments below!

5 comments

  1. Hi Øyvind, very nice story and interesting findings. I sure wil use your Invoke-dosdir function!
    One very small detail: I think that switch parameters should be $False by default, you could use for instance $FullNameOnly instead of $GetInfo.
    Please go on writing your blog!

    Like

    1. Hi, thanks! Yes, I debated a bit whether to make it false or true by default. I initially made it default to Full Name only, and then changed it around. But your idea is probably better, I agree 🙂

      Like

Leave a comment