Monday, January 18, 2010

Performance engineering with OProfile

I have been working on a Linux kernel replication product and have been wanting to profile the kernel module for evaluating performance. I used OProfile as the tool for extracting the profiling information.

1. Install kernel-debuginfo packages or compile a kernel since we need vmlinux

2. Since I was working with VMWare, I needed to load oprofile module and use timer interrupts
               modprobe oprofile timer=1
This took a little bit of time to figure out. If this step is not done, then you would not see any logs even though you would not see any errors.

3. Provide oprofile with the path to your vmlinux file
               opcontrol --vmlinux=/usr/lib/debug/lib//vmlinux

4. Reset the data from oprofile is needed with
               opcontrol --reset

5. Run whatever operations you want to profile
               opcontrol --start; ; opcontrol --stop

6. Dump the logs that were generated
               opcontrol --dump

7. View the logs
              opreport -c --demangle=smart --image-path= --merge tgid | less

The logs can be viewed in various formats and many types of information can be extracted. You can see the command for that here.

Here is a snippet of the output which tells us that performance bottleneck is at lock_conflict() so we can look at the logic in and around this function to improve performance.

samples  %        samples  %        image name       app name      symbol name
-------------------------------------------------------------------------------
40503    41.7815  19369    19.9559  vmlinux        vmlinux        prepare_to_copy
  40503    100.000  19369    100.000  vmlinux       vmlinux       prepare_to_copy [self]
-------------------------------------------------------------------------------
19951    20.5808  6399      6.5929  foo.ko            foo             lock_conflict
  19951    100.000  6399     100.000  foo.ko         foo             lock_conflict [self]
-------------------------------------------------------------------------------
7056      7.2787  25345    26.1130  pcnet32          pcnet32     (no symbols)
  7056     100.000  25345    100.000  pcnet32       pcnet32     (no symbols) [self]
-------------------------------------------------------------------------------
6358      6.5587  1824      1.8793  foo.ko              foo             locks_overlapped
  6358     100.000  1824     100.000  foo.ko          foo             locks_overlapped [self]
-------------------------------------------------------------------------------
6072      6.2637  12217    12.5872  foo.ko            foo             fsnlock_release_1
  6072     100.000  12217    100.000  foo.ko         foo             fsnlock_release_1 [self]
-------------------------------------------------------------------------------
2844      2.9338  4098      4.2222  foo.ko              foo             init_lockmgr
  2844     100.000  4098     100.000  foo.ko          foo             init_lockmgr [self]
-------------------------------------------------------------------------------
792       0.8170  831       0.8562  foo.ko                foo             block_write_umap
  792      100.000  831      100.000  foo.ko            foo             block_write_umap [self]
-------------------------------------------------------------------------------
733       0.7561  626       0.6450  foo.ko                foo            fsnlock_release
  733      100.000  626      100.000  foo.ko            foo            fsnlock_release [self]
-------------------------------------------------------------------------------
671       0.6922  129       0.1329  dt                      dt               fill_buffer
  671      100.000  129      100.000  dt                  dt               fill_buffer [self]
-------------------------------------------------------------------------------
428       0.4415  173       0.1782  vmlinux              vmlinux      __make_request
  428      100.000  173      100.000  vmlinux          vmlinux      __make_request [self]
-------------------------------------------------------------------------------
401       0.4137  412       0.4245  foo.ko               foo            ReadConfigurationFromCfg
  401      100.000  412      100.000  foo.ko           foo            ReadConfigurationFromCfg [self]

No comments: