[sldev] OpenJPEG optimization (Patch 20070331)

Callum Lerwick seg at haxxed.com
Sun Apr 1 17:17:46 PDT 2007


Alright, so I'm still obsessively plugging away at OpenJPEG. My latest
t1 patch is here:

http://www.haxxed.com/code/openjpeg-1.1.1-t1-optimize.20070331.patch

The main change is I noticed the lookup tables are dynamically allocated
and constantly recalculated, they all seem to be static so that seems
pretty pointless. So I moved the table calculation into a separate
program that generates a static header file to use instead.

I've also begun taking a look at the DWT. Using oprofile to track
DATA_CACHE_MISSES, I determined that accessing arrays with large strides
are an issue. I added FIXME comments with the results to the hotspots,
but I haven't wrapped my brain around the code enough to actually fix it
yet.

Another issue I noticed is OpenJPEG dynamically allocates and frees RAM
for all sorts of things, all over the place (except the t1?), often
inside loops. And it does this with an opj_malloc wrapper function, that
looks like this:

void* opj_malloc( size_t size ) {
   void *memblock = malloc(size);
   if(memblock) {
      memset(memblock, 0, size);
   }
   return memblock;
}

Its zeroing every piece of RAM it allocates! That explains where all
those memset()s are coming from in oprofile. This is likely unnecessary
in most places, but unfortunately all the code has been written to
assume all RAM has been cleared before use. A lot of structures and
arrays of pointers are dynamically allocated, and it is then assumed
that the pointers are already cleared to 0, which means bad things
happen if you just take the memset out of opj_malloc. With the help of a
specially instrumented opj_malloc and gdb, I've been going through and
painstakingly determining what allocations don't need to be cleared and
which do, starting with the largest allocations and working my way down.
Ones that don't are changed to a plain malloc, ones that do are changed
to calloc.

My latest dwt (and other things) patch:

http://www.haxxed.com/code/openjpeg-1.1.1-dwt-optimize.20070331.patch

I also came up with something of a test suite to better quantify how
much improvement I'm getting, and make sure I'm not breaking anything.
Here's my torturej2k script:

#!/bin/sh

for FILE in ~/Pictures/Jpeg2000/*.{j2k,jp2};do
  echo Decoding $FILE
  j2k_to_image -i $FILE -o $FILE.bmp >/dev/null
done

My test suite consists of a bunch of images I grabbed from various web
sites:

1.jp2            file4.jp2      oaklandbest.jp2       p0_05.j2k  p0_16.j2k
Bretagne1.j2k    file5.2.jp2    oaklandlossless.jp2   p0_09.j2k  p1_01.j2k
Bretagne2.j2k    file5.jp2      Otoe_OrthoImage8.jp2  p0_10.j2k  p1_02.j2k
CB_TM432.jp2     file8.2.jp2    Otoe_Relief8.jp2      p0_11.j2k  p1_03.j2k
CB_TM_QQ432.jp2  file8.jp2      p0_01.j2k             p0_12.j2k  p1_06.j2k
file1.jp2        oakland03.jp2  p0_02.j2k             p0_13.j2k  potholes2.jp2
file3.jp2        oakland50.jp2  p0_04.j2k             p0_14.j2k

http://www.openjpeg.org/index.php?menu=samples
http://www.microimages.com/gallery/jp2/
http://www1.mplayerhq.hu/MPlayer/samples/jpeg2000/

For performance testing, I run "time torturej2k", and take the average
of three runs from the "real" time. With an unmodified OpenJPEG 1.1.1,
it runs in 41.243 seconds. With my t1 patch, it runs in 38.14 seconds,
7.5% faster. With my dwt patch as well, it runs in 37.57 seconds, 9%
faster. For comparison, KDU runs the test in 16.041 seconds, 57.3%
faster than my current optimizations.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.secondlife.com/pipermail/sldev/attachments/20070401/78ed9120/attachment.pgp


More information about the SLDev mailing list