The performance of orthogonal range query methods depends on many parameters: the dimension of the record's multikey, the number of records and their distribution and the query range. The performance may also depend on additional parameters associated with the data structure, like leaf size for tree methods or cell size for cell methods. Also important are the implementation of the algorithms and the computer architecture. There is no single best method for doing orthogonal range queries. Some methods perform well over a wide range of parameters. Others perform well for only a small class of problems. In this section we will compare the methods previously presented. We will base our conclusions on the numerical experiments. Thus we restrict our attention to records with 3-D multikeys and cubic query ranges. Also, only a few distributions of the records were tested. Finally, note that all codes were implemented in C++ and executed on a 450 MHz i686 processor with 256 MB of memory.
The projection methods have the advantage that they are relatively easy to implement. The fundamental operations in these methods are sorting an array and doing binary searches on that array. These operations are in many standard libraries. (The C++ STL library provides this functionality.) Also, projection methods are easily adaptable to dynamic problems. When the records change, one merely resorts the arrays. However, the projection method and the related point-in-box method usually perform poorly. The execution time does not scale well with the file size, so the methods are only practical for small files. The memory usage of the projection method is moderate. It is typically not much higher than a kd-tree. The point-in-box method has a high memory requirement. Storing the rank arrays in order to do integer comparisons more than doubles the memory usage. Also, on x86 processors, doing integer instead of floating point comparisons usually increases execution time. So the "optimization" of integer comparisons becomes a performance penalty. The projection method typically outperforms the point-in-box method, but neither is recommended for time critical applications.
Kd-trees usually outperform octrees by a moderate factor. The octree typically has higher execution times and uses several times the memory of a kd-tree. This is because the octree partitions space while the kd-tree partitions the records. The high memory usage of the octree is also a result of storing the domain of each sub-tree. The exception to this rule is when the records are regularly spaced, as in the chair problem. Then the octree may have lower execution times than the kd-tree. The octree is also more easily adapted to dynamic problems than the kd-tree.
For kd-trees, it is advantageous to do domain checking during the orthogonal range query only when the query range is large. For small queries, it is best to not do domain checking and thus get faster access to the leaves. For a given problem, kd-trees are typically not the best method for doing orthogonal range queries; there is usually a cell method that has better execution times and uses less memory. However, kd-trees perform pretty well in a wide range of problems and the performance is only moderately sensitive to leaf size.
The dense cell array method performs very well on problems for which it is well suited. The structure of tree methods amortizes the cost of accessing leaves. The cell array offers constant time access to any cell. If the cell array with cell size chosen to optimize execution time will fit in memory, then the cell array will usually have lower execution times than any tree method. Depending on the number and distribution of the records, the memory usage may be quite low or very high. The performance of cell arrays is fairly sensitive to the cell size.
It has been reported that cell arrays are applicable only in situations when the query size is fixed and that in this case the cell size should be chosen to match the query size. (See Data Structures for Range Searching.) This was not the case in my experiments. For the tests in Performance Tests over a Range of Query Sizes the cell array with a fixed cell size performed well over a wide range of query sizes. Choosing the best cell size has more to do with the distribution of the records than with the size of the query range.
Sparse cell arrays usually have execution times that are almost as low as dense cell arrays. The binary search to access cells has little effect on performance. If the dense cell array has many empty cells, then the sparse cell array may offer significant savings in memory. However, if the records are distributed throughout the domain, using the sparse array structure only increases the memory usage and access time to a cell. Like the dense cell array, the performance of the sparse cell array is fairly sensitive to the cell size.
Cell arrays coupled with binary searching are similar in structure to sparse cell arrays. The former searches on records while the latter searches on cells. The execution times are usually comparable with the other cell methods. However, the memory usage is typically lower and the performance is less sensitive to cell size. In fact, the memory usage is often little more than the sequential scan method which stores only a single pointer for each record. It is interesting to note that like the projection methods, there is a binary search on records. However, the projection methods perform this search on the entire file, while the cell array coupled with binary searching searches only within a single cell. The combination of low execution times, low memory usage and insensitivity to cell size make this an attractive method for many problems.
For multiple query problems where the total number of records inside query ranges is at least as large as the number records, the cell array coupled with forward searching typically performs very well. To store the records, it uses almost the same data structure as the cell array coupled with binary searching. However, its method of sweeping through the records is more efficient than the binary search. Having to store the queries so that they can be processed in order increases the memory usage from light to moderate.
One can moderately decrease the execution time (a factor of 1/3 was common in the tests) by storing the multikeys to avoid accessing the records. However, this does increase the memory usage. Storing the keys is a useful optimization in situations when execution time is critical and available memory is sufficient.