The good news is newer RedHat and SUSE distros have been updated to mitigate this problem, specifically RHEL 6.2 and SLES SP1. As for other distros, I just don't have the access to verify everything, so if you are running a different distro and can verify this problem has been resolved, I'd appreciate hearing about which specific version addresses it so I can publish the news here.
So why should you even care about this? You may have high core counts and running a kernel that has not yet been patched. While this probably won't have any impact on any of your running applications - but do you ever run top? iostat? sar? or any other monitoring tools? If you're reading this you probably run collectl. Most monitoring tools are farily light-weight and for a good reason - if you're trying to measure something you don't want the tool's overhead to get in the way. Unfortunately with this regression it will now!
In the following example, you can see monitoring CPU data takes about 3 seconds to read almost 9K samples and write them to a file on a 2-socket/dual-core system. Very efficient!
time collectl -sc -i0 -c 8640 -f/tmp real 0m2.879s user 0m1.908s sys 0m0.913s
time collectl -sc -i0 -c 864 -f/tmp real 0m16.783s user 0m3.003s sys 0m13.523s
Since a simply uname command will tell you your kernel version, you might think that's all it takes, but nothing is always that simple because most vendors patch their kernels and you can't always be sure what code it's actually running.
One simple way to tell for sure is to run the very simple test below which times a read of /proc/stat (which seems to be the most heavily effected) by using strace see how much time is spent in the actually read.
The following is on my 2-socket/dual-core system:
strace -c cat /proc/stat>/dev/null % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000251 251 1 execve 0.00 0.000000 0 3 read 0.00 0.000000 0 1 write 0.00 0.000000 0 4 open 0.00 0.000000 0 5 close 0.00 0.000000 0 5 fstat 0.00 0.000000 0 8 mmap 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 uname 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000251 37 1 total
strace -c cat /proc/stat >/dev/null % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.014997 4999 3 read 0.00 0.000000 0 1 write 0.00 0.000000 0 20 16 open 0.00 0.000000 0 6 close 0.00 0.000000 0 12 10 stat 0.00 0.000000 0 5 fstat 0.00 0.000000 0 8 mmap 0.00 0.000000 0 3 mprotect 0.00 0.000000 0 1 munmap 0.00 0.000000 0 4 brk 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 1 arch_prctl ------ ----------- ----------- --------- --------- ---------------- 100.00 0.014997 66 27 total
| updated Jan 18, 2012 |