Quantcast
Channel:
Viewing all articles
Browse latest Browse all 4

The a) loop is memory

$
0
0

The a) loop is memory bandwidth limited. If your memory system is setup for NUMA (BIOS setting) then each socket has full speed access to all of memory attached to its processor, and slower access to memory attached to other processors. On the other hand, if the memory system interleaves addresses, then all sockets faster access 1/4th the time and slower access 3/4ths the time, however being interleaved, more fetches/stores can be in flight (as to if they are I am not certain). You will experience faster performance in NUMA configuration under the circumstances where all (most) memory accesses are to locally attached memory (or in cache).

Loop b) is compute intensive, or at least has higher number of compute cycles verses memory fetch/store cycles. It appears that there is sufficient memory bandwidth in one socked as to not affect your scaling test.

Jim Dempsey


Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles



Latest Images