As detailed in the section on Motivation, we took an iterative approach to implementing the message passing version of Barnes-Hut. We started with the naive O(n^2) method, which is as follows:
fun StepSystem(N,Bodies[1..N]) = for i in 1..N do Initialize(Bodies[i]); done for i in 1..N and j in 1..N do Interact(Bodies[i],Bodies[j]) done for i in 1..N do Update(Bodies[i]) done endThe intent of the for statements is to show that each of the steps are independent. It is important that all of the initialization finish before the interactions, and that all of the interactions finish before the updates, but within each part, the statements are independent. Now the computational work in this algorithm grows with n^2, which is obviously sub-optimal given that the Barnes-Hut algorithm grows with n log n, however for a sufficiently small system, the overhead of the n log n algorithm could easily dwarf the speedup from doing fewer interactions. In fact as we will show later, in our implementation, there is a substantial overhead for the tree-build and tree-walking steps.
Our later implementations use the preferred O(n log n) method where interactions are pruned. Once the problem size reaches reasonable values, the O(n log n) algorithm clearly is the method of choice. We quickly the saw the import of tree-building and how it affected the force calculation phase: problem size is limited by the maximum tree size, not by the interaction calculations. (See Results).
After analyzing our initial implementation, it became clear that its speed was being limited by the lack of caching of nodes; without using a software caching method to reduce communication bandwidth and mask latency, the full cost of global reads were necessary:
Note that this multiple of eight is only bounded by either the minimum perturbation constant or the floating point accuracy if the partition is a space-partition and the nodes are at nearly identical points.
Thus, during the second portion of our study, we added explicit software caching of nodes.
Continue to
Caching
-----