Why is DateTime.strptime so slow?
1 2 3 4 5 |
require 'date' for i in (1..10000) DateTime.strptime("27/Nov/2007:15:01:43 -0800", "%d/%b/%Y:%H:%M:%S %z") end |
Timing this on a P4 2.66GHz CPU produces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
garry@ubuntu:~/p/script$ ruby -v ruby 1.8.5 (2006-12-25 patchlevel 12) [i686-linux] garry@ubuntu:~/p/script$ time ruby test.rb real 0m9.719s user 0m7.520s sys 0m0.020s garry@ubuntu:~/p/script$ time ruby test.rb real 0m9.130s user 0m7.520s sys 0m0.000s garry@ubuntu:~/p/script$ time ruby test.rb real 0m8.980s user 0m7.480s sys 0m0.010s garry@ubuntu:~/p/script$ |
9.276s on average
Same thing in Python:
1 2 3 4 |
import time
for i in range(10000):
time.strptime("27/Nov/2007:15:01:43", "%d/%b/%Y:%H:%M:%S") |
Note: %z and “-0800” is left off the end because Python will throw an error, something about “z” not being supported on all platforms. I’m not sure if this will affect the performance greatly.
Anyway, that code produces:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
garry@ubuntu:~/p/script$ python Python 2.4.3 (#2, Oct 6 2006, 07:52:30) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> garry@ubuntu:~/p/script$ time python test.py real 0m0.751s user 0m0.610s sys 0m0.020s garry@ubuntu:~/p/script$ time python test.py real 0m0.737s user 0m0.610s sys 0m0.000s garry@ubuntu:~/p/script$ time python test.py real 0m0.727s user 0m0.620s sys 0m0.000s garry@ubuntu:~/p/script$ |
0.738s on average
The Python version is about 12.5x faster. Is there a different method I should be using to parse dates in Ruby? Time.parse() will not parse the string given (“27/Nov/2007:15:01:43 -0800”).
Charlie Savage does some analysis on Time and DateTime here.
Looking at date.rb and date/format.rb a bit, it seems strptime is implemented all in Ruby. Perhaps the Python version hands it off to libc?
I asked about this in #ruby, but no one seemed to be around. So I open the question to you guys… :)

Some new findings:
Since I already know the format of the string I’m going to parse, it would make more sense to parse it myself with a regex, then pass that off to Time (Time methods are in C). This gives me awesome performance:
require 'date' "27/Nov/2007:15:01:43 -0800" =~ %r{(\d{2})/(\w{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2}) -(\d{4})} day, month, year, hour, minute, second, tz = $1, $2, $3, $4, $5, $6, $7 for i in (1..10000) Time.mktime(year, month, day, hour, minute, second) endNow this is more like it:
:)