I recently was having a discussion with a fellow developer about Tcl and regular expressions.
I was going over some code and emphasizing the use of lsearch and regular expressions in list searching, stating enthusiastically that “using an lsearch with a regular expression is always faster than using a for loop“. This person was skeptical about my statement and asked me to open up a text editor and prove it.
Of course, the cocky (or confident) punk that I am, I opened up my Notepad++ and wrote these programs:
# ##############################################
# WithLoop.tcl
# ##############################################
proc indexOf {thisList seekValue} {
for {set i 0} {$i < [llength $thisList]} {incr i} {
if {[lindex $thisList $i] == $seekValue} {
return $i
}
}
return -1
}
# ##############################################
set myList [list]
set numItems 1000000
for {set i 0} {$i <= $numItems} {incr i} {
lappend myList $i
}
set startTime [clock milliseconds]
indexOf $myList $numItems
set endTime [clock milliseconds]
puts [expr $endTime - $startTime]
# ##############################################
# WithOutLoop.tcl
# ##############################################
set myList [list]
set numItems 1000000
for {set i 0} {$i <= $numItems} {incr i} {
lappend myList $i
}
set startTime [clock milliseconds]
lsearch -regexp $myList $numItems
set endTime [clock milliseconds]
puts [expr $endTime - $startTime]
As you can see, one of the programs has a proc with a for loop that searches each list element at a time, and the other just calls lsearch and seeks a (really stupid simple) regular expression pattern. They are both working with a list that is 1,000,000 elements long, and seeking the very last element (forcing both methods to search the entire list to get a hit). I thought I would run each program 10 times, take the average time, and this would prove which was faster. To my surprise, I didn’t get the results I expected! The for loop version averaged out to be 504 milliseconds, and the lsearch version averaged 992 milliseconds. What? How could a for loop be almost twice as fast as an lsearch using a regular expression? This goes against everything I have ever known about development!
As my counterpart was basking in his triumph and bragging about out-smarting the smart-ass, I modified the program and re-ran it. I remembered from a training course I took years ago that TCL’s regular expression engine has to "start”. This means, that running it once takes more time, but once it is started it blazes. Sort of like a race where I was in a car and my opponent was on foot. When the race starts my engine is turned off and my opponent could win (or get a head start) in the short term because they can start running while I am still starting my car. Now, once my car starts and I can start driving, I blaze by my opponent and he will EAT MY DUST.
Based on this, I thought I would run the same exact search 10 times and average the time it takes to run. NOW, the numbers look more like it. Here are the modified programs:
# ##############################################
# WithLoop.tcl
# ##############################################
proc indexOf {thisList seekValue} {
for {set i 0} {$i < [llength $thisList]} {incr i} {
if {[lindex $thisList $i] == $seekValue} {
return $i
}
}
return -1
}
# ##############################################
set myList [list]
set numItems 1000000
for {set i 0} {$i <= $numItems} {incr i} {
lappend myList $i
}
set startTime [clock milliseconds]
for {set i 0} {$i < 10} {incr i} {
indexOf $myList $numItems
}
set endTime [clock milliseconds]
puts [expr $endTime - $startTime]
# ##############################################
# WithOutLoop.tcl
# ##############################################
set myList [list]
set numItems 1000000
for {set i 0} {$i <= $numItems} {incr i} {
lappend myList $i
}
set startTime [clock milliseconds]
for {set i 0} {$i < 10} {incr i} {
lsearch -regexp $myList $numItems
}
set endTime [clock milliseconds]
puts [expr $endTime - $startTime]
This time the for loop example averaged 5,018 milliseconds and the lsearch averaged 2,308. I increased the number of items from 1,000,000 to 5,000,000 and got 24,687 and 12,175 respectively. These two list lenghs show the added performance benefit of using lsearch over the length of a running programming.
We both tried to declare victory, but technically I was correct… Sorry dude…
Lesson? Be more specific when challenging other developers on performance issues and when using TCL you should ALWAYS use lsearch to search a list instead of a loop.