About Performance
This past November, I had the privilege of speaking with Nathan Ziehnert at MMS Jazz Edition in New Orleans, LA about PowerShell Performance. If you haven’t heard of his blog (Z-Nerd), you should definitely check it out. I’ve been meaning to write more blog posts, and I though that our topic would make a great post.
PowerShell script performance is often an afterthought, if it’s even a thought at all. Many times, we look at a script and say “It’s Works! My job here is done.” And for some instances, that’s good enough. Perhaps I’m just writing a single run script to do a one time task and then it’ll get tossed. But what about the rest of our PowerShell scripts? What about all the scripts that run our automation, task sequences, and daily tasks?
In my opinion, performance can be broken down into three categories: Cost, Maintainability, and Speed. Cost in this regard is not how much money it costs to run a script in say Azure Automation, but rather what is the resource cost on the machine that’s running it. Think CPU, memory, and disk utilization. Maintainability is the time it takes to maintain or modify the script. As our environment evolves, solutions get upgraded, or modules and snap-ins get code updates, we need to give our script the occasional tune-up. This is where format, and good coding practices come into play. Lastly, we have speed. This first blog post will focus on speed, or how fast our scripts run.
Disclaimer
What I’m covering here is general. While it may apply to 99% of use cases, be sure to test for your individual script. There are a lot of factors that can affect the speed of a script including, but not limited to, OS, PowerShell version, hardware, and run behavior. What works for one script may not work for the next. What works on one machine, may not work on the rest. What works when you tested it, will never work in production (Murphy’s Law)
What Affects Performance?
There are quite a few things that can affect performance. Some of them we can control, others we can’t. For the things we can’t control, there are often ways to mitigate or work around the constraints. I’ll take a look at some of both in this post such as:
- Test Order
- Loop Execution
- .NET Methods vs. Native Cmdlets
- Strong Typed/Typecasting
- Syntax
- Output
Testing method
For testing/measuring, I wrote a couple of functions. The first is a function named Test-Performance. It returns PSObject with all of our test data that we can use to generate visualizations. The second is a simple function that takes the mean or median results from two sets, compares them, and returns a winner along with how much faster it was. The functions:
function Test-Performance {
[CmdletBinding()]
Param(
[Parameter(Mandatory=$true,Position=1)]
[ValidateRange(5,50000)]
[int]$Count,
[Parameter(Mandatory=$true,Position=2)]
[ScriptBlock]$ScriptBlock
)
$Private:Occurrence = [System.Collections.Generic.List[Double]]::new()
$Private:Sorted = [System.Collections.Generic.List[Double]]::new()
$Private:ScriptBlockOutput = [System.Collections.Generic.List[string]]::new()
[Double]$Private:Sum = 0
[Double]$Private:Mean = 0
[Double]$Private:Median = 0
[Double]$Private:Minimum = 0
[Double]$Private:Maximum = 0
[Double]$Private:Range = 0
[Double]$Private:Variance = 0
[Double]$Private:StdDeviation = 0
$Private:ReturnObject = '' | Select-Object Occurrence,Sorted,Sum,Mean,Median,Minimum,Maximum,Range,Variance,StdDeviation,Output
#Gather Results
for ($i = 0; $i -lt $Count; $i++) {
$Timer = [System.Diagnostics.Stopwatch]::StartNew()
#$Private:Output = Invoke-Command -ScriptBlock $ScriptBlock
$Private:Output = $ScriptBlock.Invoke()
$Timer.Stop()
$Private:Result = $Timer.Elapsed
$Private:Sum += $Private:Result.TotalMilliseconds
[void]$Private:ScriptBlockOutput.Add($Private:Output)
[void]$Private:Occurrence.Add($Private:Result.TotalMilliseconds)
[void]$Private:Sorted.Add($Private:Result.TotalMilliseconds)
}
$Private:ReturnObject.Sum = $Private:Sum
$Private:ReturnObject.Occurrence = $Private:Occurrence
if (($Private:ScriptBlockOutput -notcontains "true") -and ($Private:ScriptBlockOutput -notcontains "false") -and ($Private:ScriptBlockOutput -notcontains $null)) {
$Private:ReturnObject.Output = $Private:ScriptBlockOutput
} else {
$Private:ReturnObject.Output = $null
}
#Sort
$Private:Sorted.Sort()
$Private:ReturnObject.Sorted = $Private:Sorted
#Statistical Calculations
#Mean (Average)
$Private:Mean = $Private:Sum / $Count
$Private:ReturnObject.Mean = $Private:Mean
#Median
if (($Count % 2) -eq 1) {
$Private:Median = $Private:Sorted[([Math]::Ceiling($Count / 2))]
} else {
$Private:Middle = $Count / 2
$Private:Median = (($Private:Sorted[$Private:Middle]) + ($Private:Sorted[$Private:Middle + 1])) / 2
}
$Private:ReturnObject.Median = $Private:Median
#Minimum
$Private:Minimum = $Private:Sorted[0]
$Private:ReturnObject.Minimum = $Private:Minimum
#Maximum
$Private:Maximum = $Private:Sorted[$Count - 1]
$Private:ReturnObject.Maximum = $Private:Maximum
#Range
$Private:Range = $Private:Maximum - $Private:Minimum
$Private:ReturnObject.Range = $Private:Range
#Variance
for ($i = 0; $i -lt $Count; $i++) {
$x = ($Private:Sorted[$i] - $Private:Mean)
$Private:Variance += ($x * $x)
}
$Private:Variance = $Private:Variance / $Count
$Private:ReturnObject.Variance = $Private:Variance
#Standard Deviation
$Private:StdDeviation = [Math]::Sqrt($Private:Variance)
$Private:ReturnObject.StdDeviation = $Private:StdDeviation
return $Private:ReturnObject
}
Function Get-Winner {
[CmdletBinding()]
Param(
[Parameter(Mandatory=$true,Position=1)]
[ValidateNotNullOrEmpty()]
[string]$AName,
[Parameter(Mandatory=$true,Position=2)]
[ValidateNotNullOrEmpty()]
[Double]$AValue,
[Parameter(Mandatory=$true,Position=3)]
[ValidateNotNullOrEmpty()]
[string]$BName,
[Parameter(Mandatory=$true,Position=4)]
[ValidateNotNullOrEmpty()]
[Double]$BValue
)
if ($ClearBetweenTests) {
Clear-Host
}
$blen = $AName.Length + $BName.Length + 12
$Border = ''
for ($i = 0; $i -lt $blen; $i++) {
$Border += '*'
}
if ($OutToFile) {
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject $Border
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('** {0} vs {1} **', $AName, $BName))
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject $Border
}
Write-Host $Border -ForegroundColor White
Write-Host ([string]::Format('** {0} vs {1} **', $AName, $BName)) -ForegroundColor White
Write-Host $Border -ForegroundColor White
if ($AValue -lt $BValue) {
$Faster = $BValue / $AValue
if ($Faster -lt 1.05) {
$Winner = 'Tie'
$AColor = [ConsoleColor]::White
$BColor = [ConsoleColor]::White
} else {
$Winner = $AName
$AColor = [ConsoleColor]::Green
$BColor = [ConsoleColor]::Red
}
} elseif ($AValue -gt $BValue) {
$Faster = $AValue / $BValue
if ($Faster -lt 1.05) {
$Winner = 'Tie'
$AColor = [ConsoleColor]::White
$BColor = [ConsoleColor]::White
} else {
$Winner = $BName
$AColor = [ConsoleColor]::Red
$BColor = [ConsoleColor]::Green
}
} else {
$Winner = 'Tie'
$AColor = [ConsoleColor]::White
$BColor = [ConsoleColor]::White
$Faster = 0
}
$APad = ''
$BPad = ''
if ($AName.Length -gt $BName.Length) {
$LenDiff = $AName.Length - $BName.Length
for ($i = 0; $i -lt $LenDiff; $i++) {
$BPad += ' '
}
} else {
$LenDiff = $BName.Length - $AName.Length
for ($i = 0; $i -lt $LenDiff; $i++) {
$APad += ' '
}
}
$AValue = [Math]::Round($AValue, 2)
$BValue = [Math]::Round($BValue, 2)
$Faster = [Math]::Round($Faster, 2)
if ($OutToFile) {
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('{0}: {1}{2}ms', $AName, $APad, $AValue))
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('{0}: {1}{2}ms', $BName, $BPad, $BValue))
Out-File -FilePath $OutFileName -Append -Encoding utf8 -InputObject ([string]::Format('WINNER: {0} {1}x Faster`r`n', $Winner, $Faster))
}
Write-Host ([string]::Format('{0}: {1}{2}ms', $AName, $APad, $AValue)) -ForegroundColor $AColor
Write-Host ([string]::Format('{0}: {1}{2}ms', $BName, $BPad, $BValue)) -ForegroundColor $BColor
Write-Host ([string]::Format('WINNER: {0} {1}x Faster', $Winner, $Faster)) -ForegroundColor Yellow
if ($PauseBetweenTests -eq $true) {
Pause
}
}
Now, you may be wondering why I went through all of that trouble when there is a perfectly good Measure-Command cmdlet available. The reason is two fold. One, I wanted the statistics to be calculated without having to call a separate function. Two, Measure-Command does not handle output, and I wanted to be able to test and capture output if needed. If you haven’t tried to use Write-Output withing a Measure-Command script block before let me show you what I’m talking about:
PS C:> Measure-Command {Write-Host "Write-Host"}
Write-Host
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 12
Ticks : 128366
TotalDays : 1.48571759259259E-07
TotalHours : 3.56572222222222E-06
TotalMinutes : 0.000213943333333333
TotalSeconds : 0.0128366
TotalMilliseconds : 12.8366
PS C:> Measure-Command {Write-Output "Write-Output"}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 0
Ticks : 4005
TotalDays : 4.63541666666667E-09
TotalHours : 1.1125E-07
TotalMinutes : 6.675E-06
TotalSeconds : 0.0004005
TotalMilliseconds : 0.4005
Notice that when we call Write-Output that we don’t see the output? With my Test-Performance function, we can grab that output and still capture it. The output from the two functions looks like this:
Occurrance : {4.0353, 1.0091, 0, 0…}
Sorted : {0, 0, 0, 1.0091…}
Sum : 5.0444
Mean : 1.00888
Median : 1.0091
Minimum : 0
Maximum : 4.0353
Range : 4.0353
Variance : 2.4425469256
StdDeviation : 1.56286497356618
Output : {Write-Output, Write-Output, Write-Output, Write-Output…}
******************************
** Filter vs Where-Object **
******************************
Filter: 22ms
Where-Object: 1007.69393ms
WINNER: Filter 45.80x Faster
One other thing to mention is that I did not simply use $Start = Get-Date, $Stop = (Get-Date)-$Start. Why? Because some things happen so fast that I need to measure the speed in ticks or microseconds. Get-Date only measures time down to the millisecond, so anything less than a millisecond will be rounded to either 0 or 1 millisecond.
Test Order
With that out of the way, let’s look at test order first. Test Order is the order in which conditions are evaluated within a flow control block such as if/then or do/while. The compiler or engine will evaluate the conditions from left to right while respecting the order of operations. Why is this important? Let’s say you have the following if statement:
if (($haystack -contains "needle") -and ($x -eq 5)) {
Do-Stuff
}
We have two conditions: Does $Haystack contain the string “needle” and does $x equal 5. With the and statement we tell the engine that both must be true to meet the conditions of the if statement. The engine will evaluate the first statement, and if true, will continue through the remaining statements until it reaches either a false statement or has evaluated all statements. Let’s take a quick look at how long it takes to evaluate a few different types of conditions.
$v = 5000
$a = 1..10000
$s = 'reg.exe'
$as = (Get-ChildItem -Path C:\Windows\System32 -Filter '*.exe').Name
Test-Performance -Count 100 -ScriptBlock {$v -eq 5000}
Test-Performance -Count 100 -ScriptBlock {$a -contains 5000}
Test-Performance -Count 100 -ScriptBlock {$s -eq 'reg.exe'}
Test-Performance -Count 100 -ScriptBlock {$as -contains 'reg.exe'}
That gives me the following output:
Occurrence : {0.9741, 0, 0, 5.9834…}
Sorted : {0, 0, 0, 0…}
Sum : 40.8437
Mean : 0.408437
Median : 0
Minimum : 0
Maximum : 5.9834
Range : 5.9834
Variance : 0.538503541531
StdDeviation : 0.733828005414757
Output :
Occurrence : {0.9977, 0.9969, 0, 0.9971…}
Sorted : {0, 0, 0, 0…}
Sum : 67.7895
Mean : 0.677895
Median : 0.9934
Minimum : 0
Maximum : 3.9467
Range : 3.9467
Variance : 0.450743557675
StdDeviation : 0.671374379668304
Output :
Occurrence : {0.989, 0.9973, 0, 0…}
Sorted : {0, 0, 0, 0…}
Sum : 52.9174
Mean : 0.529174
Median : 0
Minimum : 0
Maximum : 8.9804
Range : 8.9804
Variance : 1.222762781524
StdDeviation : 1.10578604690238
Output :
Occurrence : {0.997, 0, 0, 1.0292…}
Sorted : {0, 0, 0, 0…}
Sum : 74.7425
Mean : 0.747425
Median : 0.957
Minimum : 0
Maximum : 6.9484
Range : 6.9484
Variance : 1.391867727275
StdDeviation : 1.1797744391514
Output :
What we see is that comparing if something is equal to something else is a lot faster than checking to see if an array contains an object. Now, I know, you’re thinking it’s just a couple milliseconds, but, checking if $v is equal to 5000 is almost twice as fast as checking if $as contains “reg.exe”. Keep in mind, that depending on where in the array our match is and how big our array is, that number can go up or down quite a bit. I’m just doing some simple synthetic tests to illustrate that there is a difference. When doing conditional statements like this, try to have your quicker conditions evaluated first and try to have statements that are most likely to fail evaluated first. Example:
Test-Performance -Count 100 -ScriptBlock {
if (($x -eq 100) -or ($a -contains -5090) -or ($s -eq 'test.fake') -or ($as -contains 'reg.exe')) {
$t = Get-Random
}
}
Test-Performance -Count 100 -ScriptBlock {
if (($as -contains 'reg.exe') -or ($x -eq 100) -or ($a -contains -5090) -or ($s -eq 'test.fake')) {
$t = Get-Random
}
}
Gives me the following:
Occurrence : {0.9858, 0, 0.9969, 0…}
Sorted : {0, 0, 0, 0…}
Sum : 36.8537
Mean : 0.368537
Median : 0
Minimum : 0
Maximum : 3.9959
Range : 3.9959
Variance : 0.390509577731
StdDeviation : 0.624907655362774
Output :
Occurrence : {0.9974, 0, 0.9971, 0…}
Sorted : {0, 0, 0, 0…}
Sum : 54.8193
Mean : 0.548193
Median : 0.4869
Minimum : 0
Maximum : 3.9911
Range : 3.9911
Variance : 0.425326705251
StdDeviation : 0.652170763873236
Output :
We can see that by changing the order of the evaluated conditions, our code runs in about 2/3 the time. Again, these are generic tests to illustrate the effect that test order has on execution time, but they illustrate some basic guidelines that should be able to be applied to most situations. Be sure to test your code.
Loop Execution
Now that I’ve gotten test order out of the way, let’s start with loop execution. Sometimes when we are working with a loop, like say stepping through an array, we don’t need to do something for every element. Sometimes, we are looking for a specific element and don’t care about anything after that. In these cases, break is our friend. For our first example, I’ll create an array of all years in the 1900’s. I’ll then loop through each one and write some output when I find 1950.
$Decade = 1900..1999
$TargetYear = 1950
$NoBreakResult = Test-Performance -Count 10 -ScriptBlock {
for ($i = 0; $i -lt $Decade.Count; $i++) {
if ($Decade[$i] -eq 1950) {
Write-Output "Found 1950"
}
}
}
$BreakResult = Test-Performance -Count 10 -ScriptBlock {
for ($i = 0; $i -lt $Decade.Count; $i++) {
if ($Decade[$i] -eq 1950) {
Write-Output "Found 1950"
break
}
}
}
Get-Winner "No Break" $NoBreakResult.Median "Break" $BreakResult.Median
Our output looks as follows:
*************************
** No Break vs Break **
*************************
No Break: 0.38ms
Break: 0.28ms
WINNER: Break 1.35x Faster
$NoBreakResult
Occurrence : {0.8392, 0.4704, 0.4566, 0.444…}
Sorted : {0.3425, 0.3438, 0.3442, 0.3445…}
Sum : 47.8028
Mean : 0.478028
Median : 0.38175
Minimum : 0.3425
Maximum : 2.6032
Range : 2.2607
Variance : 0.127489637016
StdDeviation : 0.357056910052165
Output : {Found 1950, Found 1950, Found 1950, Found 1950…}
$BreakResult
Occurrence : {3.2739, 0.3445, 0.32, 0.3167…}
Sorted : {0.2657, 0.266, 0.266, 0.2662…}
Sum : 40.0342
Mean : 0.400342
Median : 0.2871
Minimum : 0.2657
Maximum : 3.2739
Range : 3.0082
Variance : 0.182262889036
StdDeviation : 0.426922579674582
Output : {Found 1950, Found 1950, Found 1950, Found 1950…}
As expected, the instance with the break commandwas about 25% faster. Next I’ll take a look at a different method that I don’t see in too many peoples code, the While/Do-While loop.
$DoWhileResult = Test-Performance -Count 100 -ScriptBlock {
$i = 0
$Year = 0
do {
$Year = $Decade[$i]
if ($Year -eq 1950) {
Write-Output "Found 1950"
}
$i++
} While ($Year -ne 1950)
}
Which nets me the following:
****************************
** Do-While vs No Break **
****************************
Do-While: 0.24ms
No Break: 0.38ms
WINNER: Do-While 1.57x Faster
$DoWhileResult
Occurrence : {0.9196, 0.313, 0.2975, 0.2933…}
Sorted : {0.2239, 0.224, 0.2242, 0.2243…}
Sum : 33.8217
Mean : 0.338217
Median : 0.2436
Minimum : 0.2239
Maximum : 5.0187
Range : 4.7948
Variance : 0.262452974211
StdDeviation : 0.512301643771519
Output : {Found 1950, Found 1950, Found 1950, Found 1950…}
As we can see, Do-While is also faster than running through the entire array. My example above does not have any safety mechanism for running beyond the end of the array or not finding the element I’m searching for. In practice, be sure to include such a condition/catch in your loop. Next, I’m going to compare the performance between a few different types of loops. Each loop will run through an array of numbers from 1 to 10,000 and calculate the square root of each. I’ll use the basic for loop as the baseline to compare against the other methods.
$ForLoop = Test-Performance -Count 100 -ScriptBlock {
$ForArray = 1..10000
for ($i = 0; $i -lt 10000; $i++) {
$sqrt = [Math]::Sqrt($Array[$i])
}
}
$ForEachLoop = Test-Performance -Count 100 -ScriptBlock {
$ForEachArray = 1..10000
foreach ($item in $ForEachArray) {
$sqrt = [Math]::Sqrt($item)
}
}
$DotForEachLoop = Test-Performance -Count 100 -ScriptBlock {
$DotForEachArray = 1..10000
$DotForEachArray.ForEach{
$sqrt = [Math]::Sqrt($_)
}
}
$ForEachObjectLoop = Test-Performance -Count 100 -ScriptBlock {
$ForEachObjectArray = 1..10000
$ForEachObjectArray | ForEach-Object {
$sqrt = [Math]::Sqrt($_)
}
}
So how do they fare?
***********************
** For vs For-Each **
***********************
For: 3ms
For-Each: 1.99355ms
WINNER: For-Each 1.50x Faster
***********************
** For vs .ForEach **
***********************
For: 3ms
.ForEach: 1150.95495ms
WINNER: For 383.65x Faster
*****************************
** For vs ForEach-Object **
*****************************
For: 3ms
ForEach-Object: 1210.7644ms
WINNER: For 403.59x Faster
Quite a bit of difference. Let’s take a look at the statistics for each of them.
$ForLoop
Occurrence : {38.8952, 3.9984, 2.9752, 2.966…}
Sorted : {0, 0, 1.069, 1.9598…}
Sum : 330.7618
Mean : 3.307618
Median : 2.9731
Minimum : 0
Maximum : 38.8952
Range : 38.8952
Variance : 26.100085284676
StdDeviation : 5.10882425658546
Output :
$ForEachLoop
Occurrence : {7.0133, 1.9972, 1.9897, 0.9678…}
Sorted : {0.9637, 0.9678, 0.9927, 0.9941…}
Sum : 187.5277
Mean : 1.875277
Median : 1.99355
Minimum : 0.9637
Maximum : 7.0133
Range : 6.0496
Variance : 0.665303603371
StdDeviation : 0.815661451443551
Output :
$DotForEachLoop
Occurrence : {1225.7258, 1169.9073, 1147.9007, 1146.9384…}
Sorted : {1110.0618, 1110.0688, 1113.9906, 1114.0656…}
Sum : 114948.9291
Mean : 1149.489291
Median : 1150.95495
Minimum : 1110.0618
Maximum : 1225.7258
Range : 115.664
Variance : 534.931646184819
StdDeviation : 23.1285893686757
Output :
$ForEachObjectLoop
Occurrence : {1217.7802, 1241.7037, 1220.686, 1249.688…}
Sorted : {1181.8081, 1188.8231, 1188.8291, 1191.7818…}
Sum : 121345.8078
Mean : 1213.458078
Median : 1210.7644
Minimum : 1181.8081
Maximum : 1274.6289
Range : 92.8208000000002
Variance : 318.356594303116
StdDeviation : 17.8425501065043
Output :
If you notice on the for and for-each loops, the first run is significantly higher than the other entries (in fact, it’s the slowest run in the batch for each) whereas the method and cmdlet versions are much more consistent. This is due to the behavior of those methods. With a for and for-each loop, the method loads the entire collection into memory before processing. This causes the first run of the loop to take a bit longer, although, it’s still faster than the method or cmdlet. The cmdlet and method are slower as they load one iteration into memory at a time which is slower than loading the sum all at once (think random read/write vs sequential). The for loop is slightly slower than for-each because it has to evaluate the condition before proceeding through the next iteration.
.NET Methods vs. Cmdlets
Next, I’ll take a look at some of the differences between some of the common “native” PowerShell cmdlets and their .NET counterparts. I’m going to start with what will likely be the most common things that you’ll encounter in your scripts or scripts that you use, the array. We’ve probably all used them, and maybe even continue to use them. But should you? First, Let’s look at adding items to an array. We frequently start with blank arrays and add items to them as we go along.
$ArrayResult = Test-Performance -Count 100 -ScriptBlock {
$Array = @()
for ($i =0; $i -lt 10000; $i ++) {
$Array += $i
}
}
$ListResult = Test-Performance -Count 100 -ScriptBlock {
$List = [System.Collections.Generic.List[PSObject]]::new()
for ($i =0; $i -lt 10000; $i ++) {
[void]$List.Add($i)
}
}
Get-Winner "Array" $ArrayResult.Median "List" $ListResult.Median
*********************
** Array vs List **
*********************
Array: 2274ms
List: 2.97945ms
WINNER: List 763.23x Faster
$ArrayResult
Occurrence : {2407.5676, 2311.8239, 2420.5336, 2268.9383…}
Sorted : {2190.1917, 2200.1205, 2219.1807, 2223.0887…}
Sum : 228595.7729
Mean : 2285.957729
Median : 2274.42135
Minimum : 2190.1917
Maximum : 2482.3996
Range : 292.2079
Variance : 2527.01010551066
StdDeviation : 50.2693754239165
Output :
$ListResult
Occurrence : {24.9343, 19.9729, 3.9623, 5.9836…}
Sorted : {0.9974, 1.9776, 1.9925, 1.994…}
Sum : 373.999
Mean : 3.73999
Median : 2.97945
Minimum : 0.9974
Maximum : 51.8617
Range : 50.8643
Variance : 37.0781465771
StdDeviation : 6.0891827511662
Output :
Modifying an existing array can be a VERY expensive operation. Arrays are fixed length and can not be expanded or contracted. When we add or subtract an element, the engine first has to create a new array of size n + 1 or n – 1 and then copy each of the elements from the old array into the new one. This is slow, and can consume a lot of memory while the new array is being created and contents copied over. Lists on the other hand are not statically sized. The advantage of an array however is that they have a smaller memory footprint. Since an array is stored as a whole consecutively in memory, it’s size can roughly be calculated as SizeOf(TypeOf(Element))*NumElements. A Linked list on the other hand is not stored consecutively within memory and is a bit larger since each element contains a pointer to the next object. It’s size can roughly be calculated as (SizeOf(TypeOf(Element)) + SizeOf(Int)) * NumElements. You might be thinking, well, if an array is stored in a consecutive memory blocks, once the array is established, it should be faster to work with right? I’ll test.
[int[]]$Array = 1..10000
$List = [System.Collections.Generic.List[int]]::new()
for ($i = 1; $i -lt 10001; $i++) {
[void]$List.Add($i)
}
$ArrayForEachResult = Test-Performance -Count 100 -ScriptBlock {
foreach ($int in $Array) {
$int = 5
}
}
$ListForEachResult = Test-Performance -Count 100 -ScriptBlock {
foreach ($int in $List) {
$int = 5
}
}
First, we create an array of 10,000 elements with the numbers 1 through 10,000 inclusive. We declare the array as an integer array to ensure we are comparing to objects of the same type so to speak. We then create a list<int> and fill with the same values. So how do they fare?
***************************************
** Array For-Each vs List For-Each **
***************************************
Array For-Each: 1.47ms
List For-Each: 0.83ms
WINNER: List For-Each 1.77x Faster
$ArrayForEachResult
Occurrence : {4.643, 1.4673, 1.4029, 1.3336…}
Sorted : {1.3194, 1.3197, 1.3255, 1.3272…}
Sum : 156.4036
Mean : 1.564036
Median : 1.47125
Minimum : 1.3194
Maximum : 4.643
Range : 3.3236
Variance : 0.136858367504
StdDeviation : 0.369943735592319
Output : {, , , …}
$ListForEachResult
Occurrence : {7.233, 1.723, 0.8305, 0.8632…}
Sorted : {0.6174, 0.6199, 0.6203, 0.6214…}
Sum : 164.8467
Mean : 1.648467
Median : 0.83335
Minimum : 0.6174
Maximum : 71.705
Range : 71.0876
Variance : 50.017547074011
StdDeviation : 7.07230846852787
Output : {, , , …}
As we can see, the list still out-performs the array, although by less of a margin than it did during the manipulation test. I suspect that this is due to the cmdlet having to load the entirety of the array as opposed to just pointers with the list. Now let’s compare .NET Regex vs the PowerShell method. For this, I’m going to be replacing text instead of just checking for the match. Let’s look at the code.
$Haystack = "The Quick Brown Fox Jumped Over the Lazy Dog 5 Times"
$Needle = "\ ([\d]{1})\ "
$NetRegexResult = Test-Performance -Count 1000 -ScriptBlock {
[regex]::Replace($Haystack, $Needle, " $(Get-Random -Minimum 2 -Maximum 9) ")
Write-Output $Haystack
}
$PoshRegexResult = Test-Performance -Count 1000 -ScriptBlock {
$Haystack -replace $Needle, " $(Get-Random -Minimum 2 -Maximum 9) "
Write-Output $Haystack
}
Get-Winner ".NET RegEx" $NetRegexResult.Median "PoSh RegEx" $PoshRegexResult.Median
Nothing too fancy here. We take our haystack (the sentence), look for the needle (the number of times the fox jumped over the dog) and replace it with a new single digit random number.
********************************
** .NET RegEx vs PoSh RegEx **
********************************
.NET RegEx: 0.23ms
PoSh RegEx: 0.23ms
WINNER: Tie 1x Faster
$NetRegexResult
Occurrence : {0.7531, 0.2886, 0.3572, 0.3181…}
Sorted : {0.2096, 0.2106, 0.211, 0.2189…}
Sum : 282.3331
Mean : 0.2823331
Median : 0.23035
Minimum : 0.2096
Maximum : 2.2704
Range : 2.0608
Variance : 0.03407226235439
StdDeviation : 0.184586733961003
Output : {The Quick Brown Fox Jumped Over the Lazy Dog 5 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 8 Times
The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 4 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times,
The Quick Brown Fox Jumped Over the Lazy Dog 4 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times…}
$PoshRegexResult
Occurrence : {0.7259, 0.2546, 0.2513, 0.2486…}
Sorted : {0.2208, 0.2209, 0.2209, 0.2211…}
Sum : 279.0913
Mean : 0.2790913
Median : 0.231
Minimum : 0.2208
Maximum : 2.1124
Range : 1.8916
Variance : 0.03001781767431
StdDeviation : 0.173256508317321
Output : {The Quick Brown Fox Jumped Over the Lazy Dog 8 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 6 Times
The Quick Brown Fox Jumped Over the Lazy Dog 5 Times, The Quick Brown Fox Jumped Over the Lazy Dog 7 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times,
The Quick Brown Fox Jumped Over the Lazy Dog 2 Times The Quick Brown Fox Jumped Over the Lazy Dog 5 Times…}
Surprisingly, or not surprisingly, the two methods are pretty much dead even. This is one of the cases where I suspect the PowerShell cmdlet is pretty much just a wrapper/alias for the corresponding .NET equivalent. You might ask yourself why you would use the .NET methods if the PowerShell cmdlets net the same performance. The answer (and this applies in a lot of cases) is that the PowerShell Cmdlets don’t always offer the same options as their .NET counterparts. I’m going to use String.Split as an example. Take a look at the two documentation pages for “String” -Split and String.Split. You may have noticed that they aren’t entirely the same. Far from it actually. In most cases, they will return you with the same results, but they don’t both support the same options. For example, if you want to remove blank entries, you’ll need to use the .Split() method. But what about performance?
$SplitString = "one,two,three,four,five,six,seven,eight,nine,ten,"
$DashSplitResult = Test-Performance -Count 10000 -ScriptBlock {
$SplitArray = $SplitString -Split ','
}
$DotSplitResult = Test-Performance -Count 10000 -ScriptBlock {
$SplitArray = $SplitString.Split(',')
}
Get-Winner "-Split" $DashSplitResult.Median ".Split()" $DotSplitResult.Median
**************************
** -Split vs .Split() **
**************************
-Split: 0.13ms
.Split(): 0.12ms
WINNER: .Split() 1.1x Faster
$DashSplitResult
Occurrence : {0.4855, 0.1837, 0.2387, 0.1916…}
Sorted : {0.1128, 0.113, 0.1131, 0.1131…}
Sum : 1613.99049999999
Mean : 0.161399049999999
Median : 0.125
Minimum : 0.1128
Maximum : 2.1112
Range : 1.9984
Variance : 0.0234987082460975
StdDeviation : 0.153292883872988
Output : {, , , …}
$DotSplitResult
Occurrence : {0.552, 0.1339, 0.1245, 0.1227…}
Sorted : {0.1052, 0.1056, 0.1056, 0.1057…}
Sum : 1485.38330000001
Mean : 0.148538330000001
Median : 0.1162
Minimum : 0.1052
Maximum : 1.9226
Range : 1.8174
Variance : 0.0186857188798111
StdDeviation : 0.136695716391594
Output : {, , , …}
Pretty close, but .Split does edge out -Split by just a hair. Is it worth re-writing all of your code? Doubtful. But if you use string splitting methods frequently, it may be worth doing some testing with your common use cases to see if there could be an impact. And with that, I’m going to wrap up the first part of this post.