Performance PowerShell: Part 1


About Performance

This past November, I had the privilege of speaking with Nathan Ziehnert at MMS Jazz Edition in New Orleans, LA about PowerShell Performance. If you haven’t heard of his blog (Z-Nerd), you should definitely check it out. I’ve been meaning to write more blog posts, and I though that our topic would make a great post.

PowerShell script performance is often an afterthought, if it’s even a thought at all. Many times, we look at a script and say “It’s Works! My job here is done.” And for some instances, that’s good enough. Perhaps I’m just writing a single run script to do a one time task and then it’ll get tossed. But what about the rest of our PowerShell scripts? What about all the scripts that run our automation, task sequences, and daily tasks?

In my opinion, performance can be broken down into three categories: Cost, Maintainability, and Speed. Cost in this regard is not how much money it costs to run a script in say Azure Automation, but rather what is the resource cost on the machine that’s running it. Think CPU, memory, and disk utilization. Maintainability is the time it takes to maintain or modify the script. As our environment evolves, solutions get upgraded, or modules and snap-ins get code updates, we need to give our script the occasional tune-up. This is where format, and good coding practices come into play. Lastly, we have speed. This first blog post will focus on speed, or how fast our scripts run.

Disclaimer

What I’m covering here is general. While it may apply to 99% of use cases, be sure to test for your individual script. There are a lot of factors that can affect the speed of a script including, but not limited to, OS, PowerShell version, hardware, and run behavior. What works for one script may not work for the next. What works on one machine, may not work on the rest. What works when you tested it, will never work in production (Murphy’s Law)

What Affects Performance?

There are quite a few things that can affect performance. Some of them we can control, others we can’t. For the things we can’t control, there are often ways to mitigate or work around the constraints. I’ll take a look at some of both in this post such as:

  • Test Order
  • Loop Execution
  • .NET Methods vs. Native Cmdlets
  • Strong Typed/Typecasting
  • Syntax
  • Output
Testing method

For testing/measuring, I wrote a couple of functions. The first is a function named Test-Performance. It returns PSObject with all of our test data that we can use to generate visualizations. The second is a simple function that takes the mean or median results from two sets, compares them, and returns a winner along with how much faster it was. The functions:

function Test-Performance {
    [CmdletBinding()]
    Param(
        [Parameter(Mandatory=$true,Position=1)]
        [ValidateRange(5,50000)]
        [int]$Count,
        [Parameter(Mandatory=$true,Position=2)]
        [ScriptBlock]$ScriptBlock
    )
    $Private:Occurrence = [System.Collections.Generic.List[Double]]::new()
    $Private:Sorted = [System.Collections.Generic.List[Double]]::new()
    $Private:ScriptBlockOutput = [System.Collections.Generic.List[string]]::new()
    [Double]$Private:Sum = 0
    [Double]$Private:Mean = 0
    [Double]$Private:Median = 0
    [Double]$Private:Minimum = 0
    [Double]$Private:Maximum = 0
    [Double]$Private:Range = 0
    [Double]$Private:Variance = 0
    [Double]$Private:StdDeviation = 0
    $Private:ReturnObject = '' | Select-Object Occurrence,Sorted,Sum,Mean,Median,Minimum,Maximum,Range,Variance,StdDeviation,Output

    #Gather Results
    for ($i = 0; $i -lt $Count; $i++) {
        $Timer = [System.Diagnostics.Stopwatch]::StartNew()
        #$Private:Output = Invoke-Command -ScriptBlock $ScriptBlock
        $Private:Output = $ScriptBlock.Invoke()
        $Timer.Stop()
        $Private:Result = $Timer.Elapsed
        $Private:Sum += $Private:Result.TotalMilliseconds
        [void]$Private:ScriptBlockOutput.Add($Private:Output)
        [void]$Private:Occurrence.Add($Private:Result.TotalMilliseconds)
        [void]$Private:Sorted.Add($Private:Result.TotalMilliseconds)
    }
    $Private:ReturnObject.Sum = $Private:Sum
    $Private:ReturnObject.Occurrence = $Private:Occurrence
    if (($Private:ScriptBlockOutput -notcontains "true") -and ($Private:ScriptBlockOutput -notcontains "false") -and ($Private:ScriptBlockOutput -notcontains $null)) {
        $Private:ReturnObject.Output = $Private:ScriptBlockOutput
    } else {
        $Private:ReturnObject.Output = $null
    }
    #Sort
    $Private:Sorted.Sort()
    $Private:ReturnObject.Sorted = $Private:Sorted

    #Statistical Calculations
    #Mean (Average)
    $Private:Mean = $Private:Sum / $Count
    $Private:ReturnObject.Mean = $Private:Mean

    #Median
    if (($Count % 2) -eq 1) {
        $Private:Median = $Private:Sorted[([Math]::Ceiling($Count / 2))]
    } else {
        $Private:Middle = $Count / 2
        $Private:Median = (($Private:Sorted[$Private:Middle]) + ($Private:Sorted[$Private:Middle + 1])) / 2
    }
    $Private:ReturnObject.Median = $Private:Median

    #Minimum
    $Private:Minimum = $Private:Sorted[0]
    $Private:ReturnObject.Minimum = $Private:Minimum

    #Maximum
    $Private:Maximum = $Private:Sorted[$Count - 1]
    $Private:ReturnObject.Maximum = $Private:Maximum

    #Range
    $Private:Range = $Private:Maximum - $Private:Minimum
    $Private:ReturnObject.Range = $Private:Range

    #Variance
    for ($i = 0; $i -lt $Count; $i++) {
        $x = ($Private:Sorted[$i] - $Private:Mean)
        $Private:Variance += ($x * $x)
    }
    $Private:Variance = $Private:Variance / $Count
    $Private:ReturnObject.Variance = $Private:Variance

    #Standard Deviation
    $Private:StdDeviation = [Math]::Sqrt($Private:Variance)
    $Private:ReturnObject.StdDeviation = $Private:StdDeviation
    
    return $Private:ReturnObject
}
Function Get-Winner {
    [CmdletBinding()]
    Param(
        [Parameter(Mandatory=$true,Position=1)]
        [ValidateNotNullOrEmpty()]
        [string]$AName,
        [Parameter(Mandatory=$true,Position=2)]
        [ValidateNotNullOrEmpty()]
        [Double]$AValue,
        [Parameter(Mandatory=$true,Position=3)]
        [ValidateNotNullOrEmpty()]
        [string]$BName,
        [Parameter(Mandatory=$true,Position=4)]
        [ValidateNotNullOrEmpty()]
        [Double]$BValue
    )
    if ($ClearBetweenTests) {
        Clear-Host
    }

    $blen = $AName.Length + $BName.Length + 12
    $Border = ''
    for ($i = 0; $i -lt $blen; $i++) {
        $Border += '*'
    }

    if ($OutToFile) {
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject $Border
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('**  {0} vs {1}  **', $AName, $BName))
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject $Border
    }
    Write-Host $Border -ForegroundColor White
    Write-Host ([string]::Format('**  {0} vs {1}  **', $AName, $BName)) -ForegroundColor White
    Write-Host $Border -ForegroundColor White

    if ($AValue -lt $BValue) {
        $Faster = $BValue / $AValue
        if ($Faster -lt 1.05) {
            $Winner = 'Tie'
            $AColor = [ConsoleColor]::White
            $BColor = [ConsoleColor]::White
        } else {
            $Winner = $AName
            $AColor = [ConsoleColor]::Green
            $BColor = [ConsoleColor]::Red
        }
    } elseif ($AValue -gt $BValue) {
        $Faster = $AValue / $BValue
        if ($Faster -lt 1.05) {
            $Winner = 'Tie'
            $AColor = [ConsoleColor]::White
            $BColor = [ConsoleColor]::White
        } else {
            $Winner = $BName
            $AColor = [ConsoleColor]::Red
            $BColor = [ConsoleColor]::Green
        }
    } else {
        $Winner = 'Tie'
        $AColor = [ConsoleColor]::White
        $BColor = [ConsoleColor]::White
        $Faster = 0
    }
    
    $APad = ''
    $BPad = ''
    if ($AName.Length -gt $BName.Length) {
        $LenDiff = $AName.Length - $BName.Length
        for ($i = 0; $i -lt $LenDiff; $i++) {
            $BPad += ' '
        }
    } else {
        $LenDiff = $BName.Length - $AName.Length
        for ($i = 0; $i -lt $LenDiff; $i++) {
            $APad += ' '
        }
    }

    $AValue = [Math]::Round($AValue, 2)
    $BValue = [Math]::Round($BValue, 2)
    $Faster = [Math]::Round($Faster, 2)
    
    if ($OutToFile) {
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('{0}:  {1}{2}ms', $AName, $APad, $AValue))
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('{0}:  {1}{2}ms', $BName, $BPad, $BValue))
        Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('WINNER: {0} {1}x Faster`r`n', $Winner, $Faster))
    }
    Write-Host ([string]::Format('{0}:  {1}{2}ms', $AName, $APad, $AValue)) -ForegroundColor $AColor
    Write-Host ([string]::Format('{0}:  {1}{2}ms', $BName, $BPad, $BValue)) -ForegroundColor $BColor
    Write-Host ([string]::Format('WINNER: {0} {1}x Faster', $Winner, $Faster)) -ForegroundColor Yellow
    if ($PauseBetweenTests -eq $true) {
        Pause
    }
}

Now, you may be wondering why I went through all of that trouble when there is a perfectly good Measure-Command cmdlet available. The reason is two fold. One, I wanted the statistics to be calculated without having to call a separate function. Two, Measure-Command does not handle output, and I wanted to be able to test and capture output if needed. If you haven’t tried to use Write-Output withing a Measure-Command script block before let me show you what I’m talking about:

PS C:> Measure-Command {Write-Host "Write-Host"}
Write-Host


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 12
Ticks             : 128366
TotalDays         : 1.48571759259259E-07
TotalHours        : 3.56572222222222E-06
TotalMinutes      : 0.000213943333333333
TotalSeconds      : 0.0128366
TotalMilliseconds : 12.8366



PS C:> Measure-Command {Write-Output "Write-Output"}                                                  

Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 0
Ticks             : 4005
TotalDays         : 4.63541666666667E-09
TotalHours        : 1.1125E-07
TotalMinutes      : 6.675E-06
TotalSeconds      : 0.0004005
TotalMilliseconds : 0.4005

Notice that when we call Write-Output that we don’t see the output? With my Test-Performance function, we can grab that output and still capture it. The output from the two functions looks like this:

 Occurrance   : {4.0353, 1.0091, 0, 0…}
 Sorted       : {0, 0, 0, 1.0091…}
 Sum          : 5.0444
 Mean         : 1.00888
 Median       : 1.0091
 Minimum      : 0
 Maximum      : 4.0353
 Range        : 4.0353
 Variance     : 2.4425469256
 StdDeviation : 1.56286497356618
 Output       : {Write-Output, Write-Output, Write-Output, Write-Output…}

 ******************************
 **  Filter vs Where-Object  **
 ******************************
 Filter:        22ms
 Where-Object:  1007.69393ms
 WINNER: Filter 45.80x Faster

One other thing to mention is that I did not simply use $Start = Get-Date, $Stop = (Get-Date)-$Start. Why? Because some things happen so fast that I need to measure the speed in ticks or microseconds. Get-Date only measures time down to the millisecond, so anything less than a millisecond will be rounded to either 0 or 1 millisecond.

Test Order

With that out of the way, let’s look at test order first. Test Order is the order in which conditions are evaluated within a flow control block such as if/then or do/while. The compiler or engine will evaluate the conditions from left to right while respecting the order of operations. Why is this important? Let’s say you have the following if statement:

if (($haystack -contains "needle") -and ($x -eq 5)) {
    Do-Stuff
}

We have two conditions: Does $Haystack contain the string “needle” and does $x equal 5. With the and statement we tell the engine that both must be true to meet the conditions of the if statement. The engine will evaluate the first statement, and if true, will continue through the remaining statements until it reaches either a false statement or has evaluated all statements. Let’s take a quick look at how long it takes to evaluate a few different types of conditions.

$v = 5000
$a = 1..10000
$s = 'reg.exe'
$as = (Get-ChildItem -Path C:\Windows\System32 -Filter '*.exe').Name

Test-Performance -Count 100 -ScriptBlock {$v -eq 5000}
Test-Performance -Count 100 -ScriptBlock {$a -contains 5000}
Test-Performance -Count 100 -ScriptBlock {$s -eq 'reg.exe'}
Test-Performance -Count 100 -ScriptBlock {$as -contains 'reg.exe'}

That gives me the following output:

 Occurrence   : {0.9741, 0, 0, 5.9834…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 40.8437
 Mean         : 0.408437
 Median       : 0
 Minimum      : 0
 Maximum      : 5.9834
 Range        : 5.9834
 Variance     : 0.538503541531
 StdDeviation : 0.733828005414757
 Output       :

 Occurrence   : {0.9977, 0.9969, 0, 0.9971…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 67.7895
 Mean         : 0.677895
 Median       : 0.9934
 Minimum      : 0
 Maximum      : 3.9467
 Range        : 3.9467
 Variance     : 0.450743557675
 StdDeviation : 0.671374379668304
 Output       :

 Occurrence   : {0.989, 0.9973, 0, 0…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 52.9174
 Mean         : 0.529174
 Median       : 0
 Minimum      : 0
 Maximum      : 8.9804
 Range        : 8.9804
 Variance     : 1.222762781524
 StdDeviation : 1.10578604690238
 Output       :

 Occurrence   : {0.997, 0, 0, 1.0292…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 74.7425
 Mean         : 0.747425
 Median       : 0.957
 Minimum      : 0
 Maximum      : 6.9484
 Range        : 6.9484
 Variance     : 1.391867727275
 StdDeviation : 1.1797744391514
 Output       :

What we see is that comparing if something is equal to something else is a lot faster than checking to see if an array contains an object. Now, I know, you’re thinking it’s just a couple milliseconds, but, checking if $v is equal to 5000 is almost twice as fast as checking if $as contains “reg.exe”. Keep in mind, that depending on where in the array our match is and how big our array is, that number can go up or down quite a bit. I’m just doing some simple synthetic tests to illustrate that there is a difference. When doing conditional statements like this, try to have your quicker conditions evaluated first and try to have statements that are most likely to fail evaluated first. Example:

Test-Performance -Count 100 -ScriptBlock {
    if (($x -eq 100) -or ($a -contains -5090) -or ($s -eq 'test.fake') -or ($as -contains 'reg.exe')) {
        $t = Get-Random
    }
}

Test-Performance -Count 100 -ScriptBlock {
    if (($as -contains 'reg.exe') -or ($x -eq 100) -or ($a -contains -5090) -or ($s -eq 'test.fake')) {
        $t = Get-Random
    }
}

Gives me the following:

Occurrence   : {0.9858, 0, 0.9969, 0…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 36.8537
 Mean         : 0.368537
 Median       : 0
 Minimum      : 0
 Maximum      : 3.9959
 Range        : 3.9959
 Variance     : 0.390509577731
 StdDeviation : 0.624907655362774
 Output       :

 Occurrence   : {0.9974, 0, 0.9971, 0…}
 Sorted       : {0, 0, 0, 0…}
 Sum          : 54.8193
 Mean         : 0.548193
 Median       : 0.4869
 Minimum      : 0
 Maximum      : 3.9911
 Range        : 3.9911
 Variance     : 0.425326705251
 StdDeviation : 0.652170763873236
 Output       :

We can see that by changing the order of the evaluated conditions, our code runs in about 2/3 the time. Again, these are generic tests to illustrate the effect that test order has on execution time, but they illustrate some basic guidelines that should be able to be applied to most situations. Be sure to test your code.

Loop Execution

Now that I’ve gotten test order out of the way, let’s start with loop execution. Sometimes when we are working with a loop, like say stepping through an array, we don’t need to do something for every element. Sometimes, we are looking for a specific element and don’t care about anything after that. In these cases, break is our friend. For our first example, I’ll create an array of all years in the 1900’s. I’ll then loop through each one and write some output when I find 1950.

$Decade = 1900..1999
$TargetYear = 1950

$NoBreakResult = Test-Performance -Count 10 -ScriptBlock {
    for ($i = 0; $i -lt $Decade.Count; $i++) {
        if ($Decade[$i] -eq 1950) {
            Write-Output "Found 1950"
        }
    }
}

$BreakResult = Test-Performance -Count 10 -ScriptBlock {
    for ($i = 0; $i -lt $Decade.Count; $i++) {
        if ($Decade[$i] -eq 1950) {
            Write-Output "Found 1950"
            break
        }
    }
}

Get-Winner "No Break" $NoBreakResult.Median "Break" $BreakResult.Median

Our output looks as follows:

 ************************* 
 **  No Break vs Break  **
 *************************
 No Break:  0.38ms
 Break:     0.28ms
 WINNER: Break 1.35x Faster

$NoBreakResult
 Occurrence   : {0.8392, 0.4704, 0.4566, 0.444…}
 Sorted       : {0.3425, 0.3438, 0.3442, 0.3445…}
 Sum          : 47.8028
 Mean         : 0.478028
 Median       : 0.38175
 Minimum      : 0.3425
 Maximum      : 2.6032
 Range        : 2.2607
 Variance     : 0.127489637016
 StdDeviation : 0.357056910052165
 Output       : {Found 1950, Found 1950, Found 1950, Found 1950…}

$BreakResult
 Occurrence   : {3.2739, 0.3445, 0.32, 0.3167…}
 Sorted       : {0.2657, 0.266, 0.266, 0.2662…}
 Sum          : 40.0342
 Mean         : 0.400342
 Median       : 0.2871
 Minimum      : 0.2657
 Maximum      : 3.2739
 Range        : 3.0082
 Variance     : 0.182262889036
 StdDeviation : 0.426922579674582
 Output       : {Found 1950, Found 1950, Found 1950, Found 1950…}

As expected, the instance with the break commandwas about 25% faster. Next I’ll take a look at a different method that I don’t see in too many peoples code, the While/Do-While loop.

$DoWhileResult = Test-Performance -Count 100 -ScriptBlock {
    $i = 0
    $Year = 0
    do {
        $Year = $Decade[$i]
        if ($Year -eq 1950) {
            Write-Output "Found 1950"
        }
        $i++
    } While ($Year -ne 1950)
}

Which nets me the following:

 ****************************
 **  Do-While vs No Break  **
 ****************************
 Do-While:  0.24ms
 No Break:  0.38ms
 WINNER: Do-While 1.57x Faster

$DoWhileResult
 Occurrence   : {0.9196, 0.313, 0.2975, 0.2933…}
 Sorted       : {0.2239, 0.224, 0.2242, 0.2243…}
 Sum          : 33.8217
 Mean         : 0.338217
 Median       : 0.2436
 Minimum      : 0.2239
 Maximum      : 5.0187
 Range        : 4.7948
 Variance     : 0.262452974211
 StdDeviation : 0.512301643771519
 Output       : {Found 1950, Found 1950, Found 1950, Found 1950…}

As we can see, Do-While is also faster than running through the entire array. My example above does not have any safety mechanism for running beyond the end of the array or not finding the element I’m searching for. In practice, be sure to include such a condition/catch in your loop. Next, I’m going to compare the performance between a few different types of loops. Each loop will run through an array of numbers from 1 to 10,000 and calculate the square root of each. I’ll use the basic for loop as the baseline to compare against the other methods.

$ForLoop = Test-Performance -Count 100 -ScriptBlock {
    $ForArray = 1..10000
    for ($i = 0; $i -lt 10000; $i++) {
        $sqrt = [Math]::Sqrt($Array[$i])
    }
}

$ForEachLoop = Test-Performance -Count 100 -ScriptBlock {
    $ForEachArray = 1..10000
    foreach ($item in $ForEachArray) {
        $sqrt = [Math]::Sqrt($item)
    }
}

$DotForEachLoop = Test-Performance -Count 100 -ScriptBlock {
    $DotForEachArray = 1..10000
    $DotForEachArray.ForEach{
        $sqrt = [Math]::Sqrt($_)
    }
}

$ForEachObjectLoop = Test-Performance -Count 100 -ScriptBlock {
    $ForEachObjectArray = 1..10000
    $ForEachObjectArray | ForEach-Object {
        $sqrt = [Math]::Sqrt($_)
    }
}

So how do they fare?

 ***********************
 **  For vs For-Each  **
 ***********************
 For:       3ms
 For-Each:  1.99355ms
 WINNER: For-Each 1.50x Faster
 
 ***********************
 **  For vs .ForEach  **
 ***********************
 For:       3ms
 .ForEach:  1150.95495ms
 WINNER: For 383.65x Faster
 
 *****************************
 **  For vs ForEach-Object  **
 *****************************
 For:             3ms
 ForEach-Object:  1210.7644ms
 WINNER: For 403.59x Faster

Quite a bit of difference. Let’s take a look at the statistics for each of them.

$ForLoop
 Occurrence   : {38.8952, 3.9984, 2.9752, 2.966…}
 Sorted       : {0, 0, 1.069, 1.9598…}
 Sum          : 330.7618
 Mean         : 3.307618
 Median       : 2.9731
 Minimum      : 0
 Maximum      : 38.8952
 Range        : 38.8952
 Variance     : 26.100085284676
 StdDeviation : 5.10882425658546
 Output       :

$ForEachLoop
 Occurrence   : {7.0133, 1.9972, 1.9897, 0.9678…}
 Sorted       : {0.9637, 0.9678, 0.9927, 0.9941…}
 Sum          : 187.5277
 Mean         : 1.875277
 Median       : 1.99355
 Minimum      : 0.9637
 Maximum      : 7.0133
 Range        : 6.0496
 Variance     : 0.665303603371
 StdDeviation : 0.815661451443551
 Output       :

$DotForEachLoop
 Occurrence   : {1225.7258, 1169.9073, 1147.9007, 1146.9384…}
 Sorted       : {1110.0618, 1110.0688, 1113.9906, 1114.0656…}
 Sum          : 114948.9291
 Mean         : 1149.489291
 Median       : 1150.95495
 Minimum      : 1110.0618
 Maximum      : 1225.7258
 Range        : 115.664
 Variance     : 534.931646184819
 StdDeviation : 23.1285893686757
 Output       :

$ForEachObjectLoop
 Occurrence   : {1217.7802, 1241.7037, 1220.686, 1249.688…}
 Sorted       : {1181.8081, 1188.8231, 1188.8291, 1191.7818…}
 Sum          : 121345.8078
 Mean         : 1213.458078
 Median       : 1210.7644
 Minimum      : 1181.8081
 Maximum      : 1274.6289
 Range        : 92.8208000000002
 Variance     : 318.356594303116
 StdDeviation : 17.8425501065043
 Output       :

If you notice on the for and for-each loops, the first run is significantly higher than the other entries (in fact, it’s the slowest run in the batch for each) whereas the method and cmdlet versions are much more consistent. This is due to the behavior of those methods. With a for and for-each loop, the method loads the entire collection into memory before processing. This causes the first run of the loop to take a bit longer, although, it’s still faster than the method or cmdlet. The cmdlet and method are slower as they load one iteration into memory at a time which is slower than loading the sum all at once (think random read/write vs sequential). The for loop is slightly slower than for-each because it has to evaluate the condition before proceeding through the next iteration.

.NET Methods vs. Cmdlets

Next, I’ll take a look at some of the differences between some of the common “native” PowerShell cmdlets and their .NET counterparts. I’m going to start with what will likely be the most common things that you’ll encounter in your scripts or scripts that you use, the array. We’ve probably all used them, and maybe even continue to use them. But should you? First, Let’s look at adding items to an array. We frequently start with blank arrays and add items to them as we go along.

$ArrayResult = Test-Performance -Count 100 -ScriptBlock {
    $Array = @()
    for ($i =0; $i -lt 10000; $i ++) {
        $Array += $i
    }
}

$ListResult = Test-Performance -Count 100 -ScriptBlock {
    $List = [System.Collections.Generic.List[PSObject]]::new()
    for ($i =0; $i -lt 10000; $i ++) {
        [void]$List.Add($i)
    }
}

Get-Winner "Array" $ArrayResult.Median "List" $ListResult.Median
 *********************
 **  Array vs List  **
 *********************
 Array:  2274ms
 List:   2.97945ms
 WINNER: List 763.23x Faster

$ArrayResult
 Occurrence   : {2407.5676, 2311.8239, 2420.5336, 2268.9383…}
 Sorted       : {2190.1917, 2200.1205, 2219.1807, 2223.0887…}
 Sum          : 228595.7729
 Mean         : 2285.957729
 Median       : 2274.42135
 Minimum      : 2190.1917
 Maximum      : 2482.3996
 Range        : 292.2079
 Variance     : 2527.01010551066
 StdDeviation : 50.2693754239165
 Output       :

$ListResult
 Occurrence   : {24.9343, 19.9729, 3.9623, 5.9836…}
 Sorted       : {0.9974, 1.9776, 1.9925, 1.994…}
 Sum          : 373.999
 Mean         : 3.73999
 Median       : 2.97945
 Minimum      : 0.9974
 Maximum      : 51.8617
 Range        : 50.8643
 Variance     : 37.0781465771
 StdDeviation : 6.0891827511662
 Output       :

Modifying an existing array can be a VERY expensive operation. Arrays are fixed length and can not be expanded or contracted. When we add or subtract an element, the engine first has to create a new array of size n + 1 or n – 1 and then copy each of the elements from the old array into the new one. This is slow, and can consume a lot of memory while the new array is being created and contents copied over. Lists on the other hand are not statically sized. The advantage of an array however is that they have a smaller memory footprint. Since an array is stored as a whole consecutively in memory, it’s size can roughly be calculated as SizeOf(TypeOf(Element))*NumElements. A Linked list on the other hand is not stored consecutively within memory and is a bit larger since each element contains a pointer to the next object. It’s size can roughly be calculated as (SizeOf(TypeOf(Element)) + SizeOf(Int)) * NumElements. You might be thinking, well, if an array is stored in a consecutive memory blocks, once the array is established, it should be faster to work with right? I’ll test.

[int[]]$Array = 1..10000
$List = [System.Collections.Generic.List[int]]::new()
for ($i = 1; $i -lt 10001; $i++) {
    [void]$List.Add($i)
}

$ArrayForEachResult = Test-Performance -Count 100 -ScriptBlock {
    foreach ($int in $Array) {
        $int = 5
    }
}

$ListForEachResult = Test-Performance -Count 100 -ScriptBlock {
    foreach ($int in $List) {
        $int = 5
    }
}

First, we create an array of 10,000 elements with the numbers 1 through 10,000 inclusive. We declare the array as an integer array to ensure we are comparing to objects of the same type so to speak. We then create a list<int> and fill with the same values. So how do they fare?

 ***************************************
 **  Array For-Each vs List For-Each  **
 ***************************************
 Array For-Each:  1.47ms
 List For-Each:   0.83ms
 WINNER: List For-Each 1.77x Faster

$ArrayForEachResult
 Occurrence   : {4.643, 1.4673, 1.4029, 1.3336…}
 Sorted       : {1.3194, 1.3197, 1.3255, 1.3272…}
 Sum          : 156.4036
 Mean         : 1.564036
 Median       : 1.47125
 Minimum      : 1.3194
 Maximum      : 4.643
 Range        : 3.3236
 Variance     : 0.136858367504
 StdDeviation : 0.369943735592319
 Output       : {, , , …}

$ListForEachResult
 Occurrence   : {7.233, 1.723, 0.8305, 0.8632…}
 Sorted       : {0.6174, 0.6199, 0.6203, 0.6214…}
 Sum          : 164.8467
 Mean         : 1.648467
 Median       : 0.83335
 Minimum      : 0.6174
 Maximum      : 71.705
 Range        : 71.0876
 Variance     : 50.017547074011
 StdDeviation : 7.07230846852787
 Output       : {, , , …}

As we can see, the list still out-performs the array, although by less of a margin than it did during the manipulation test. I suspect that this is due to the cmdlet having to load the entirety of the array as opposed to just pointers with the list. Now let’s compare .NET Regex vs the PowerShell method. For this, I’m going to be replacing text instead of just checking for the match. Let’s look at the code.

$Haystack = "The Quick Brown Fox Jumped Over the Lazy Dog 5 Times"
$Needle = "\ ([\d]{1})\ "

$NetRegexResult = Test-Performance -Count 1000 -ScriptBlock {
    [regex]::Replace($Haystack, $Needle, " $(Get-Random -Minimum 2 -Maximum 9) ")
    Write-Output $Haystack
}

$PoshRegexResult = Test-Performance -Count 1000 -ScriptBlock {
    $Haystack -replace $Needle, " $(Get-Random -Minimum 2 -Maximum 9) "
    Write-Output $Haystack
}

Get-Winner ".NET RegEx" $NetRegexResult.Median "PoSh RegEx" $PoshRegexResult.Median

Nothing too fancy here. We take our haystack (the sentence), look for the needle (the number of times the fox jumped over the dog) and replace it with a new single digit random number.

 ********************************
 **  .NET RegEx vs PoSh RegEx  **
 ********************************
 .NET RegEx:  0.23ms
 PoSh RegEx:  0.23ms
 WINNER: Tie 1x Faster

$NetRegexResult
 Occurrence   : {0.7531, 0.2886, 0.3572, 0.3181…}
 Sorted       : {0.2096, 0.2106, 0.211, 0.2189…}
 Sum          : 282.3331
 Mean         : 0.2823331
 Median       : 0.23035
 Minimum      : 0.2096
 Maximum      : 2.2704
 Range        : 2.0608
 Variance     : 0.03407226235439
 StdDeviation : 0.184586733961003
 Output       : {The Quick Brown Fox Jumped Over the Lazy Dog 5 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 8 Times   
                The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 4 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times,   
                The Quick Brown Fox Jumped Over the Lazy Dog 4 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times…}

$PoshRegexResult
 Occurrence   : {0.7259, 0.2546, 0.2513, 0.2486…}
 Sorted       : {0.2208, 0.2209, 0.2209, 0.2211…}
 Sum          : 279.0913
 Mean         : 0.2790913
 Median       : 0.231
 Minimum      : 0.2208
 Maximum      : 2.1124
 Range        : 1.8916
 Variance     : 0.03001781767431
 StdDeviation : 0.173256508317321
 Output       : {The Quick Brown Fox Jumped Over the Lazy Dog 8 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 6 Times   
                The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 7 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times,   
                The Quick Brown Fox Jumped Over the Lazy Dog 2 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times…}

Surprisingly, or not surprisingly, the two methods are pretty much dead even. This is one of the cases where I suspect the PowerShell cmdlet is pretty much just a wrapper/alias for the corresponding .NET equivalent. You might ask yourself why you would use the .NET methods if the PowerShell cmdlets net the same performance. The answer (and this applies in a lot of cases) is that the PowerShell Cmdlets don’t always offer the same options as their .NET counterparts. I’m going to use String.Split as an example. Take a look at the two documentation pages for “String” -Split and String.Split. You may have noticed that they aren’t entirely the same. Far from it actually. In most cases, they will return you with the same results, but they don’t both support the same options. For example, if you want to remove blank entries, you’ll need to use the .Split() method. But what about performance?

$SplitString = "one,two,three,four,five,six,seven,eight,nine,ten,"
$DashSplitResult = Test-Performance -Count 10000 -ScriptBlock {
    $SplitArray = $SplitString -Split ','
}

$DotSplitResult = Test-Performance -Count 10000 -ScriptBlock {
    $SplitArray = $SplitString.Split(',')
}

Get-Winner "-Split" $DashSplitResult.Median ".Split()" $DotSplitResult.Median
 **************************
 **  -Split vs .Split()  **
 **************************
 -Split:    0.13ms
 .Split():  0.12ms
 WINNER: .Split() 1.1x Faster

$DashSplitResult
 Occurrence   : {0.4855, 0.1837, 0.2387, 0.1916…}
 Sorted       : {0.1128, 0.113, 0.1131, 0.1131…}
 Sum          : 1613.99049999999
 Mean         : 0.161399049999999
 Median       : 0.125
 Minimum      : 0.1128
 Maximum      : 2.1112
 Range        : 1.9984
 Variance     : 0.0234987082460975
 StdDeviation : 0.153292883872988
 Output       : {, , , …}

$DotSplitResult
 Occurrence   : {0.552, 0.1339, 0.1245, 0.1227…}
 Sorted       : {0.1052, 0.1056, 0.1056, 0.1057…}
 Sum          : 1485.38330000001
 Mean         : 0.148538330000001
 Median       : 0.1162
 Minimum      : 0.1052
 Maximum      : 1.9226
 Range        : 1.8174
 Variance     : 0.0186857188798111
 StdDeviation : 0.136695716391594
 Output       : {, , , …}

Pretty close, but .Split does edge out -Split by just a hair. Is it worth re-writing all of your code? Doubtful. But if you use string splitting methods frequently, it may be worth doing some testing with your common use cases to see if there could be an impact. And with that, I’m going to wrap up the first part of this post.

Leave a Reply

Your email address will not be published. Required fields are marked *