README

PYBENCH - A Python Benchmark Suite


 Extendable suite of low-level benchmarks for measuring
      the performance of the Python implementation 
             (interpreter, compiler or VM).

pybench is a collection of tests that provides a standardized way to measure the performance of Python implementations. It takes a very close look at different aspects of Python programs and let's you decide which factors are more important to you than others, rather than wrapping everything up in one number, like the other performance tests do (e.g. pystone which is included in the Python Standard Library).

pybench has been used in the past by several Python developers to track down performance bottlenecks or to demonstrate the impact of optimizations and new features in Python.

The command line interface for pybench is the file pybench.py. Run this script with option '--help' to get a listing of the possible options. Without options, pybench will simply execute the benchmark and then print out a report to stdout.

Micro-Manual

Run 'pybench.py -h' to see the help screen. Run 'pybench.py' to run the benchmark suite using default settings and 'pybench.py -f ' to have it store the results in a file too.

It is usually a good idea to run pybench.py multiple times to see whether the environment, timers and benchmark run-times are suitable for doing benchmark tests.

You can use the comparison feature of pybench.py ('pybench.py -c ') to check how well the system behaves in comparison to a reference run.

If the differences are well below 10% for each test, then you have a system that is good for doing benchmark testings. Of you get random differences of more than 10% or significant differences between the values for minimum and average time, then you likely have some background processes running which cause the readings to become inconsistent. Examples include: web-browsers, email clients, RSS readers, music players, backup programs, etc.

If you are only interested in a few tests of the whole suite, you can use the filtering option, e.g. 'pybench.py -t string' will only run/show the tests that have 'string' in their name.

This is the current output of pybench.py --help:

"""

PYBENCH - a benchmark test suite for Python interpreters/compilers.

Synopsis: pybench.py [option] files…

Options and default settings: -n arg number of rounds (10) -f arg save benchmark to file arg () -c arg compare benchmark with the one in file arg () -s arg show benchmark in file arg, then exit () -w arg set warp factor to arg (10) -t arg run only tests with names matching arg () -C arg set the number of calibration runs to arg (20) -d hide noise in comparisons (0) -v verbose output (not recommended) (0) --with-gc enable garbage collection (0) --with-syscheck use default sys check interval (0) --timer arg use given timer (time.time) -h show this help text --help show this help text --debug enable debugging --copyright show copyright --examples show examples of usage

Version: 2.0

The normal operation is to run the suite and display the results. Use -f to save them for later reuse or comparisons.

Available timers:

time.time time.clock systimes.processtime

Examples:

python2.1 pybench.py -f p21.pybench python2.5 pybench.py -f p25.pybench python pybench.py -s p25.pybench -c p21.pybench """

License

See LICENSE file.

Sample output

"""

PYBENCH 2.0

  • using Python 2.4.2
  • disabled garbage collection
  • system check interval set to maximum: 2147483647
  • using timer: time.time

Calibrating tests. Please wait…

Running 10 round(s) of the suite at warp factor 10:

  • Round 1 done in 6.388 seconds.
  • Round 2 done in 6.485 seconds.
  • Round 3 done in 6.786 seconds. …
  • Round 10 done in 6.546 seconds.

Benchmark: 2006-06-12 12:09:25

Rounds: 10
Warp:   10
Timer:  time.time

Machine Details:
   Platform ID:  Linux-2.6.8-24.19-default-x86_64-with-SuSE-9.2-x86-64
   Processor:    x86_64

Python:
   Executable:   /usr/local/bin/python
   Version:      2.4.2
   Compiler:     GCC 3.3.4 (pre 3.3.5 20040809)
   Bits:         64bit
   Build:        Oct  1 2005 15:24:35 (#1)
   Unicode:      UCS2

Test minimum average operation overhead

      BuiltinFunctionCalls:    126ms    145ms    0.28us    0.274ms
       BuiltinMethodLookup:    124ms    130ms    0.12us    0.316ms
             CompareFloats:    109ms    110ms    0.09us    0.361ms
     CompareFloatsIntegers:    100ms    104ms    0.12us    0.271ms
           CompareIntegers:    137ms    138ms    0.08us    0.542ms
    CompareInternedStrings:    124ms    127ms    0.08us    1.367ms
              CompareLongs:    100ms    104ms    0.10us    0.316ms
            CompareStrings:    111ms    115ms    0.12us    0.929ms
            CompareUnicode:    108ms    128ms    0.17us    0.693ms
             ConcatStrings:    142ms    155ms    0.31us    0.562ms
             ConcatUnicode:    119ms    127ms    0.42us    0.384ms
           CreateInstances:    123ms    128ms    1.14us    0.367ms
        CreateNewInstances:    121ms    126ms    1.49us    0.335ms
   CreateStringsWithConcat:    130ms    135ms    0.14us    0.916ms
   CreateUnicodeWithConcat:    130ms    135ms    0.34us    0.361ms
              DictCreation:    108ms    109ms    0.27us    0.361ms
         DictWithFloatKeys:    149ms    153ms    0.17us    0.678ms
       DictWithIntegerKeys:    124ms    126ms    0.11us    0.915ms
        DictWithStringKeys:    114ms    117ms    0.10us    0.905ms
                  ForLoops:    110ms    111ms    4.46us    0.063ms
                IfThenElse:    118ms    119ms    0.09us    0.685ms
               ListSlicing:    116ms    120ms    8.59us    0.103ms
            NestedForLoops:    125ms    137ms    0.09us    0.019ms
      NormalClassAttribute:    124ms    136ms    0.11us    0.457ms
   NormalInstanceAttribute:    110ms    117ms    0.10us    0.454ms
       PythonFunctionCalls:    107ms    113ms    0.34us    0.271ms
         PythonMethodCalls:    140ms    149ms    0.66us    0.141ms
                 Recursion:    156ms    166ms    3.32us    0.452ms
              SecondImport:    112ms    118ms    1.18us    0.180ms
       SecondPackageImport:    118ms    127ms    1.27us    0.180ms
     SecondSubmoduleImport:    140ms    151ms    1.51us    0.180ms
   SimpleComplexArithmetic:    128ms    139ms    0.16us    0.361ms
    SimpleDictManipulation:    134ms    136ms    0.11us    0.452ms
     SimpleFloatArithmetic:    110ms    113ms    0.09us    0.571ms
  SimpleIntFloatArithmetic:    106ms    111ms    0.08us    0.548ms
   SimpleIntegerArithmetic:    106ms    109ms    0.08us    0.544ms
    SimpleListManipulation:    103ms    113ms    0.10us    0.587ms
      SimpleLongArithmetic:    112ms    118ms    0.18us    0.271ms
                SmallLists:    105ms    116ms    0.17us    0.366ms
               SmallTuples:    108ms    128ms    0.24us    0.406ms
     SpecialClassAttribute:    119ms    136ms    0.11us    0.453ms
  SpecialInstanceAttribute:    143ms    155ms    0.13us    0.454ms
            StringMappings:    115ms    121ms    0.48us    0.405ms
          StringPredicates:    120ms    129ms    0.18us    2.064ms
             StringSlicing:    111ms    127ms    0.23us    0.781ms
                 TryExcept:    125ms    126ms    0.06us    0.681ms
            TryRaiseExcept:    133ms    137ms    2.14us    0.361ms
              TupleSlicing:    117ms    120ms    0.46us    0.066ms
           UnicodeMappings:    156ms    160ms    4.44us    0.429ms
         UnicodePredicates:    117ms    121ms    0.22us    2.487ms
         UnicodeProperties:    115ms    153ms    0.38us    2.070ms

UnicodeSlicing: 126ms 129ms 0.26us 0.689ms

Totals: 6283ms 6673ms """


Writing New Tests


pybench tests are simple modules defining one or more pybench.Test subclasses.

Writing a test essentially boils down to providing two methods: .test() which runs .rounds number of .operations test operations each and .calibrate() which does the same except that it doesn't actually execute the operations.

Here's an example:

from pybench import Test

class IntegerCounting(Test):

# Version number of the test as float (x.yy); this is important
# for comparisons of benchmark runs - tests with unequal version
# number will not get compared.
version = 1.0

# The number of abstract operations done in each round of the
# test. An operation is the basic unit of what you want to
# measure. The benchmark will output the amount of run-time per
# operation. Note that in order to raise the measured timings
# significantly above noise level, it is often required to repeat
# sets of operations more than once per test round. The measured
# overhead per test round should be less than 1 second.
operations = 20

# Number of rounds to execute per test run. This should be
# adjusted to a figure that results in a test run-time of between
# 1-2 seconds (at warp 1).
rounds = 100000

def test(self):

""" Run the test.

    The test needs to run self.rounds executing
    self.operations number of operations each.

    """
    # Init the test
    a = 1

    # Run test rounds
#
    # NOTE: Use xrange() for all test loops unless you want to face
# a 20MB process !
#
    for i in xrange(self.rounds):

        # Repeat the operations per round to raise the run-time
        # per operation significantly above the noise level of the
        # for-loop overhead. 

    # Execute 20 operations (a += 1):
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1
        a += 1

def calibrate(self):

""" Calibrate the test.

    This method should execute everything that is needed to
    setup and run the test - except for the actual operations
    that you intend to measure. pybench uses this method to
        measure the test implementation overhead.

    """
    # Init the test
    a = 1

    # Run test rounds (without actually doing any operation)
    for i in xrange(self.rounds):

    # Skip the actual execution of the operations, since we
    # only want to measure the test's administration overhead.
        pass

Registering a new test module

To register a test module with pybench, the classes need to be imported into the pybench.Setup module. pybench will then scan all the symbols defined in that module for subclasses of pybench.Test and automatically add them to the benchmark suite.

Breaking Comparability

If a change is made to any individual test that means it is no longer strictly comparable with previous runs, the '.version' class variable should be updated. Therefafter, comparisons with previous versions of the test will list as "n/a" to reflect the change.

Version History

2.0: rewrote parts of pybench which resulted in more repeatable timings: - made timer a parameter - changed the platform default timer to use high-resolution timers rather than process timers (which have a much lower resolution) - added option to select timer - added process time timer (using systimes.py) - changed to use min() as timing estimator (average is still taken as well to provide an idea of the difference) - garbage collection is turned off per default - sys check interval is set to the highest possible value - calibration is now a separate step and done using a different strategy that allows measuring the test overhead more accurately - modified the tests to each give a run-time of between 100-200ms using warp 10 - changed default warp factor to 10 (from 20) - compared results with timeit.py and confirmed measurements - bumped all test versions to 2.0 - updated platform.py to the latest version - changed the output format a bit to make it look nicer - refactored the APIs somewhat 1.3+: Steve Holden added the NewInstances test and the filtering option during the NeedForSpeed sprint; this also triggered a long discussion on how to improve benchmark timing and finally resulted in the release of 2.0 1.3: initial checkin into the Python SVN repository

Have fun,

Marc-Andre Lemburg mal@lemburg.com