As detailed in the section on Motivation, we took an iterative approach to implementing the message passing version of Barnes-Hut. We started with the naive O(n^2) method, which is as follows:
fun StepSystem(N,Bodies[1..N]) =
for i in 1..N do
Initialize(Bodies[i]);
done
for i in 1..N and j in 1..N do
Interact(Bodies[i],Bodies[j])
done
for i in 1..N do
Update(Bodies[i])
done
end
The intent of the for statements is to show that each of the steps
are independent. It is important that all of the initialization finish
before the interactions, and that all of the interactions finish before
the updates, but within each part, the statements are independent.
Now the computational work in this algorithm grows with n^2,
which is obviously sub-optimal given that the Barnes-Hut algorithm
grows with n log n, however for a sufficiently small system,
the overhead of the n log n algorithm could easily dwarf the
speedup from doing fewer interactions. In fact as we will show later, in
our implementation, there is a substantial overhead for the tree-build
and tree-walking steps.
Our later implementations use the preferred O(n log n) method where interactions are pruned. Once the problem size reaches reasonable values, the O(n log n) algorithm clearly is the method of choice. We quickly the saw the import of tree-building and how it affected the force calculation phase: problem size is limited by the maximum tree size, not by the interaction calculations. (See Results).
After analyzing our initial implementation, it became clear that its speed was being limited by the lack of caching of nodes; without using a software caching method to reduce communication bandwidth and mask latency, the full cost of global reads were necessary:
Note that this multiple of eight is only bounded by either the minimum perturbation constant or the floating point accuracy if the partition is a space-partition and the nodes are at nearly identical points.
Thus, during the second portion of our study, we added explicit software caching of nodes.
Continue to
Caching
-----