[sldev] Complete viewer compilation benchmarks

James Cook james at lindenlab.com
Sat Mar 31 21:28:48 PDT 2007


This is a great analysis.  Thanks for sharing.

My own internal tests with building Second Life on Windows, Mac and 
Linux show that build time is almost completely CPU bound.  The best way 
to get faster builds is to find more workstations for Incredibuild, 
distcc, or your distributed build system of choice.  :-)

Where a RAM disk or faster hard drive might matter is for linking.  Link 
times are primarily I/O bound.  A 10K RPM disk will speed linking by 30% 
or more.  See below.

James

(A piece of mail I sent internally a while ago)

Takeaway:
I humbly recommend developer workstations include a secondary drive,
preferably at 10,000 RPM.

Details:
I decided to try to reduce my link times by throwing hardware at the
problem.

C: Western Digital 80 GB 7200 RPM 1.5 Gb SATA disk that came with my
box.  It has the operating system and compiler installed on it.
D: Western Digital 120 GB 7200 RPM 1.5 Gb SATA disk I added.
E: Western Digital 36 GB 10,000 RPM 1.5 Gb SATA disk I added.

Time to link newview_noopt.exe:
C: 2'15"
D: 2'00"
E: 1'30"

Also, because I'm remote checking out a branch takes a while.  I usually
make a copy of a "release" checkout, then svn switch it to the target
branch.

Time to duplicate "release" checkout:
C: 13'42"
D:  8'45"
E:  6'35"

E: is this disk, which cost $120 including tax and shipping.
http://www.newegg.com/Product/Product.asp?Item=N82E16822136054



Dale Glass wrote:
> I've been working on benchmarking the compilation speed. My idea was 
> to try both extremes: Building after a reboot, when no data is 
> cached, and building from an install located entirely in RAM, with 
> absolutely no hard disk usage. 
> 
> The point of this was to determine how much difference a better hard 
> disk could make. Results were quite interesting.
> 
> Hardware:
> 	Athlon 64 X2 5200+ (1MB cache)
> 	4GB ECC DDR2 666 RAM
> 	Root on RAID-1, SAMSUNG HD300LJ and Maxtor 6V300F0
> 
> Settings:
> 	Compiling with -j2
> 	llmozlib enabled
> 	building 32 bit SL version
> 	Source from http://svn.daleglass.net/secondlife, rev 81.
> 
> Software:
> 	Gentoo, x86_64, gcc 4.1.1 
> 	Ubuntu Edgy, x86_64, gcc 4.1.2
> 
> Gentoo tests were done with /tmp on tmpfs.
> 
> Ubuntu tests were done with the whole install on /tmp on tmpfs, and 
> the whole install on disk. This took about 1.5GB RAM:
> 
> * 334MB for temporary files generated during compilation
> * 642MB for the source code
> * 462MB for the Ubuntu install (probably can be trimmed a bit)
> 
> 
> Ubuntu tests were done as follows: Distribution was installed with 
> debootstrap. Sources were copied into it (same copy as used for 
> Gentoo). The whole tree, OS and sources included was copied to /tmp 
> (on tmps), and chrooted into. 
> 
> The "(on disk)" tests were done afterwards, by copying the whole tree 
> (1.5GB) back to disk (as some changes were required to get the build 
> going).
> 
> The "(after reboot)" were done as follows:
> 1. Do required preparations (eg, binary removal)
> 2. Reboot
> 3. Login
> 4. Chroot
> 5. Build
> 
> X wasn't running during the tests.
> 
> 
> Benchmarks:
> 
> Full compilation time
> ---------------------
> 
> Gentoo:
> 	real    16m44.594s
> 	user    29m25.549s
> 	sys     2m57.599s
> 
> Ubuntu:
> 	(after reboot)
> 	real    19m23.391s
> 	user    35m15.595s
> 	sys     2m9.673s
> 
> 	(on disk)
> 	real    19m12.447s
> 	user    35m23.223s
> 	sys     2m16.113s
> 
> 	(on ram)
> 	real    19m5.572s
> 	user    35m24.639s
> 	sys     2m13.387s
> 
> Build with no changes
> ---------------------
> 
> Gentoo:
> 	First time:
> 	real    0m41.752s
> 	user    0m39.938s
> 	sys     0m1.722s
> 
> 	Second time:
> 	real    0m42.537s
> 	user    0m39.220s
> 	sys     0m1.441s
> 
> Ubuntu:
> 	(after reboot)
> 	real    0m27.656s
> 	user    0m14.858s
> 	sys     0m0.952s
> 
> 	(on disk)
> 	real    0m15.457s
> 	user    0m14.963s
> 	sys     0m0.644s
> 
> 	(on ram)
> 	real    0m15.444s
> 	user    0m14.771s
> 	sys     0m0.784s
> 	
> Remove binaries, rebuild
> ------------------------
> 
> Gentoo:
> 	
> 	real    1m24.408s
> 	user    1m12.687s
> 	sys     0m4.599s
> 
> 
> Ubuntu:
> 	(after reboot)
> 	real    1m17.616s
> 	user    0m37.895s
> 	sys     0m4.360s
> 
> 	(on disk)
> 	real    0m41.092s
> 	user    0m37.455s
> 	sys     0m3.546s
> 
> 
> 	(on ram)
> 	real    0m41.268s
> 	user    0m38.016s
> 	sys     0m3.258s
> 
> 
> Conclusions:
> 
> Get RAM, and lots of it! Results seem to show that so long enough RAM 
> is present, the whole tree can be cached. This is why the "(on disk)" 
> results differ so little from the ones done fully on RAM. Based on 
> personal experience, 2GB is enough to get things done but far from 
> perfect, 4GB is ideal.
> 
> Gentoo appears to compile 15% faster than Ubuntu, but Gentoo's scons 
> is 2 times slower.
> 
> The hard disk creates a very significant difference if data is not 
> cached. If it is, it seems to be largely irrelevant. My guess as to 
> what happens during a full build: Reading source files doesn't take 
> long, and all the products of the build (.o files) get successfully 
> cached, so linking takes less than could be expected. This is why 
> while a full compile from RAM on Ubuntu takes 18 seconds less, a 
> relink takes 36 seconds less (as the .o files need to be read).
> 
> Compiling with -j3 (vs -j2 on a dual core system) may mean a 
> performance decrease if there's too little RAM, as the compiler can 
> easily take 200-300MB for some files, which could be enough to evict 
> useful data out of the cache.
> 
> A guess is that using tmpfs for /tmp may help in the case of too 
> little RAM, by acting as a permanent cache that doesn't get evicted 
> by other things going on. This may be ultimately counterproductive, 
> however.
> 
> A better hard disk should be the last option -- load the box with as 
> much RAM as possible, and the hard disk's performance will make very 
> little difference. A better disk should help quite a lot if it's not 
> possible to get enough RAM for an effective cache, though.
> 
> 
> 
> 
> The worsening of the overall compile time on Ubuntu was unexpected. I 
> can only guess that Gentoo DOES improve performance quite a bit by 
> compiling with architecture specific optimizations. However, this is 
> strange as 64 bit systems are new and all have SSE and similar 
> things, so there shouldn't be such a difference as between a program 
> built for 386 and Athlon MP.
> 
> Perhaps the current Ubuntu compiler has bad performance for whatever 
> reason? Version isn't exactly the same, so there might be something 
> there.
> 
> Gentoo was built with:
> CFLAGS="-march=athlon64 -O2 -pipe"
> 
> Absolutely nothing especially weird there.
> 
> Ubuntu compiler:
> Using built-in specs.
> Target: x86_64-linux-gnu
> Configured 
> with: ../src/configure -v --enable-languages=c,c++,fortran,objc,
> obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib
> --libexecdir=/usr/lib --without-included-gettext 
> --enable-threads=posix --enable-nls --program-suffix=-4.1 
> --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-mpfr --enable-checking=release x86_64-linux-gnu
> 
> Thread model: posix
> gcc version 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)
> 
> Gentoo compiler:
> Using built-in specs.
> Target: x86_64-pc-linux-gnu
> Configured 
> with: /var/tmp/portage/gcc-4.1.1-r3/work/gcc-4.1.1/configure
>  --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.1.1
>  --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include
>  --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1
>  --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/man
>  --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.1.1/info
>  --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.1.1/include/g++-v4
>  --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu
>  --disable-altivec --enable-nls --without-included-gettext
>  --with-system-zlib --disable-checking --disable-werror
>  --enable-secureplt --disable-libunwind-exceptions
>  --enable-multilib --disable-libmudflap --disable-libssp
>  --disable-libgcj --enable-languages=c,c++,fortran
>  --enable-shared --enable-threads=posix
>  --enable-__cxa_atexit --enable-clocale=gnu
> Thread model: posix
> gcc version 4.1.1 (Gentoo 4.1.1-r3)
> 
> 
> On the other hand, scons is oddly noticeably faster on Ubuntu. I'm not 
> sure what's going on here either. I ran ltrace on scons, and it 
> suggests that something significantly different is going on Gentoo 
> and Ubuntu, while the trees are actually the same. Traces are 
> attached.
> 
> With enough RAM present, scons will run without any disk access, and 
> using very little kernel time. This seems to suggest that scons could 
> use some optimization.
> 
> 
> ------------------------------------------------------------------------
> 
> ltrace -f -c python /usr/bin/scons -j2 DISTCC=no BTARGET=client BUILD=release ARCH=i686 COLORGCC=yes
> 
> % time     seconds  usecs/call     calls      function
> ------ ----------- ----------- --------- --------------------
>  21.91    5.147915          84     61026 strlen
>  16.56    3.889046          38    101578 memcpy
>  16.21    3.809040          38     98098 __ctype_b_loc
>  10.16    2.385957         114     20791 memset
>  10.08    2.367111         120     19631 free
>   9.22    2.165291         474      4564 fopen64
>   9.02    2.117852         331      6386 strcmp
>   1.50    0.353396          19     18497 malloc
>   1.41    0.330744          18     17851 strchr
>   0.85    0.199837          18     10633 realloc
>   0.62    0.145106          18      7852 strncpy
>   0.58    0.137361          18      7278 strcpy
>   0.29    0.069284          34      1988 __xstat64
>   0.18    0.042296          18      2287 memchr
>   0.17    0.040029          19      2100 vsnprintf
>   0.13    0.029530          26      1106 _IO_getc
>   0.12    0.027302          18      1442 pthread_self
>   0.09    0.022305          18      1214 __rawmemchr
>   0.09    0.021296        2129        10 popen
>   0.08    0.019905        4976         4 qsort
>   0.06    0.014357          20       707 __errno_location
>   0.06    0.014232          18       772 funlockfile
>   0.06    0.014192          97       145 fread
>   0.06    0.014129          18       772 flockfile
>   0.06    0.013733         980        14 __uflow
>   0.06    0.013402          18       711 strrchr
>   0.05    0.012512          42       293 fclose
>   0.05    0.011957          18       648 strstr
>   0.05    0.011486          32       351 __fxstat64
>   0.04    0.010384          20       517 strerror
>   0.03    0.007038          19       361 fileno
>   0.03    0.006391          18       351 memmove
>   0.03    0.006308          33       190 sem_post
>   0.01    0.003514          18       187 sem_trywait
>   0.01    0.003404         243        14 dlopen
>   0.01    0.002185          32        67 sigaction
>   0.01    0.002052          19       108 _setjmp
>   0.01    0.002027          20       100 __strtod_internal
>   0.01    0.001904          19       100 localeconv
>   0.00    0.001044          18        55 strcat
>   0.00    0.000637          79         8 pclose
>   0.00    0.000291          20        14 dlsym
>   0.00    0.000260          18        14 __ctype_tolower_loc
>   0.00    0.000250          27         9 readdir64
>   0.00    0.000236          39         6 lseek64
>   0.00    0.000216          43         5 fwrite
>   0.00    0.000210          21        10 clearerr
>   0.00    0.000200          66         3 ftell
>   0.00    0.000164          20         8 getenv
>   0.00    0.000145          29         5 fflush
>   0.00    0.000139          34         4 isatty
>   0.00    0.000135          19         7 feof
>   0.00    0.000134          67         2 opendir
>   0.00    0.000134          19         7 sem_init
>   0.00    0.000116          38         3 getcwd
>   0.00    0.000110          36         3 readlink
>   0.00    0.000080          20         4 sem_wait
>   0.00    0.000075          37         2 chdir
>   0.00    0.000070          70         1 realpath
>   0.00    0.000068          34         2 closedir
>   0.00    0.000066          22         3 setlocale
>   0.00    0.000063          21         3 sigemptyset
>   0.00    0.000055          27         2 sprintf
>   0.00    0.000050          50         1 read
>   0.00    0.000046          46         1 sysconf
>   0.00    0.000043          43         1 open64
>   0.00    0.000041          20         2 __strdup
>   0.00    0.000041          20         2 strncat
>   0.00    0.000040          20         2 ungetc
>   0.00    0.000035          35         1 close
>   0.00    0.000035          35         1 rewind
>   0.00    0.000029          29         1 pow
>   0.00    0.000025          25         1 getpid
>   0.00    0.000024          24         1 __libc_current_sigrtmax
>   0.00    0.000024          24         1 nl_langinfo
>   0.00    0.000024          24         1 __libc_current_sigrtmin
> ------ ----------- ----------- --------- --------------------
> 100.00   23.491165                390940 total
> 
> 
> ------------------------------------------------------------------------
> 
> ltrace -f -c python /usr/bin/scons -j2 DISTCC=no BTARGET=client BUILD=release ARCH=i686 COLORGCC=yes
> 
> % time     seconds  usecs/call     calls      function
> ------ ----------- ----------- --------- --------------------
>  27.85    2.305453         137     16739 strcmp
>  25.58    2.117985         330      6413 memcpy
>  24.67    2.042898       26191        78 read
>   9.37    0.776018          18     41706 malloc
>   3.45    0.285930          18     15525 free
>   3.03    0.250792          18     13794 strchr
>   2.09    0.173386          18      9502 memmove
>   1.20    0.099074          18      5417 strlen
>   0.96    0.079327          18      4350 memset
>   0.69    0.057382          18      3122 realloc
>   0.62    0.051619          18      2860 strncmp
>   0.22    0.018134          34       522 __xstat64
>   0.06    0.004769          18       262 strrchr
>   0.03    0.002211          18       122 __errno_location
>   0.02    0.001679          31        54 lseek64
>   0.02    0.001655          17        92 __sigsetjmp
>   0.01    0.001149          31        36 isatty
>   0.01    0.001133          37        30 open64
>   0.01    0.000984          32        30 close
>   0.01    0.000939          18        52 __ctype_toupper_loc
>   0.01    0.000939          18        52 __ctype_tolower_loc
>   0.01    0.000865          18        46 __strtol_internal
>   0.01    0.000619          18        33 getenv
>   0.01    0.000481         240         2 dlopen
>   0.00    0.000383          31        12 getgroups
>   0.00    0.000323          32        10 signal
>   0.00    0.000270         135         2 realpath
>   0.00    0.000242          24        10 setlocale
>   0.00    0.000231          21        11 snprintf
>   0.00    0.000226          37         6 __xstat
>   0.00    0.000168          56         3 _obstack_begin
>   0.00    0.000152          38         4 sprintf
>   0.00    0.000134          33         4 sigaction
>   0.00    0.000133          44         3 __strdup
>   0.00    0.000132          33         4 fcntl
>   0.00    0.000103          20         5 dcgettext
>   0.00    0.000099          49         2 readlink
>   0.00    0.000088          44         2 printf
>   0.00    0.000085          21         4 strxfrm
>   0.00    0.000082          82         1 bindtextdomain
>   0.00    0.000082          20         4 drand48
>   0.00    0.000080          40         2 __fxstat64
>   0.00    0.000076          38         2 time
>   0.00    0.000069          34         2 getuid
>   0.00    0.000066          33         2 getegid
>   0.00    0.000065          32         2 geteuid
>   0.00    0.000065          32         2 getgid
>   0.00    0.000062          31         2 frexp
>   0.00    0.000060          20         3 nl_langinfo
>   0.00    0.000056          18         3 __fsetlocking
>   0.00    0.000049          24         2 sigemptyset
>   0.00    0.000049          24         2 dlsym
>   0.00    0.000047          23         2 getpid
>   0.00    0.000044          22         2 sysconf
>   0.00    0.000043          21         2 memchr
>   0.00    0.000042          21         2 localeconv
>   0.00    0.000042          21         2 putenv
>   0.00    0.000041          20         2 __ctype_b_loc
>   0.00    0.000041          20         2 srand48
>   0.00    0.000038          19         2 strcasecmp
>   0.00    0.000033          33         1 sbrk
>   0.00    0.000024          24         1 textdomain
>   0.00    0.000021          21         1 __xpg_basename
>   0.00    0.000020          20         1 strncpy
>   0.00    0.000020          20         1 dirname
>   0.00    0.000019          19         1 strcpy
>   0.00    0.000019          19         1 calloc
>   0.00    0.000019          19         1 fputs_unlocked
>   0.00    0.000019          19         1 strstr
> ------ ----------- ----------- --------- --------------------
> 100.00    8.279583                120975 total
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Click here to unsubscribe or manage your list subscription:
> https://lists.secondlife.com/cgi-bin/mailman/listinfo/sldev


More information about the SLDev mailing list