By choosing the leaf size or cell size, tree methods and cell methods can be tuned to perform well for a given fixed query size. In this section we well consider how the various methods for doing single orthogonal range queries perform over a range of query sizes. For each test there are one million records. The records are points in 3-D space. The query ranges are cubes.
The records for this test are uniform randomly distributed points in the unit cube, . The query sizes are and range in size from about 0.01 to about 0.8. The smallest query range contains about a single record on average. The largest query range contains about half the records. See the figure below.
Log-log plot of the average number of records in the query versus the query size for the randomly distributed points in a cube problem.
The figure below shows the performance of the sequential scan method. The performance is roughly constant for small and medium sized queries. The execution time is higher for large query sizes for two reasons. Firstly, more records are returned. Secondly, for large query sizes the inclusion test is unpredictable. (If a branching statement is predictable, modern CPU's will predict the answer to save time. If they guess incorrectly, there is a roll-back penalty.) Still, the performance is only weakly dependent on the query size. There is only a factor of 2 difference between the smallest and largest query. We will see that the sequential scan algorithm performs poorly except for the largest query sizes.
Log-log plot of execution time versus query size for the sequential scan method with the randomly distributed points in a cube problem.
The performance of the projection method and the related point-in-box method are shown in the figure below. They perform much better than the sequential scan method, but we will see that they are not competitive with tree or cell methods. The projection method has slightly lower execution times than the point-in-box method. The benefit of doing integer comparisons is outweighed by the additional storage and complexity of the point-in-box method.
Log-log plot of execution time versus query size for the projection method and the point-in-box method with the randomly distributed points in a cube problem. The performance of the sequential scan method is shown for comparison.
The figures below show the performance of the kd-tree data structure. The execution time is moderately sensitive to the leaf size for small queries and mildly sensitive for large queries. As expected, small leaf sizes give good performance for small queries and large leaf sizes give good performance for large queries. The test with a leaf size of 8 has the best overall performance.
Log-log plot of execution time versus query size for the kd-tree without domain checking on the randomly distributed points in a cube problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the kd-tree data structure with domain checking. The performance is similar to the kd-tree without domain checking, but is less sensitive to leaf size for small queries. Again, the test with a leaf size of 8 has the best overall performance.
Log-log plot of execution time versus query size for the kd-tree with domain checking on the randomly distributed points in a cube problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the octree data structure. In terms of leaf size dependence, the performance is similar to the kd-tree with domain checking, however the execution times are higher. The test with a leaf size of 16 has the best overall performance.
Log-log plot of execution time versus query size for the octree on the randomly distributed points in a cube problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the tree methods in the figures below. For small queries, the kd-tree method without domain checking gives the best performance. For large queries, domain checking becomes profitable. The kd-tree data structure with domain checking during the query appears to give the best overall performance.
Log-log plot of execution time versus query size for the tree methods on the randomly distributed points in a cube problem. The key indicates the data structure. We show the kd-tree with a leaf size of 8, the kd-tree with domain checking with a leaf size of 8 and the octree with a leaf size of 16. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the cell array data structure. The execution time is highly sensitive to the cell size for small queries and moderately sensitive for large queries. Small cell sizes give good performance for small queries. For large queries, the best cell size is still quite small. The test with a cell size of 0.02 has the best overall performance.
Log-log plot of execution time versus query size for the cell array on the randomly distributed points in a cube problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the sparse cell array data structure. The performance characteristics are similar to those of the dense cell array, however the execution time is less sensitive to cell size for large queries. The test with a cell size of 0.02 has the best overall performance.
Log-log plot of execution time versus query size for the sparse cell array on the randomly distributed points in a cube problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of using a cell array with binary searching. The performance characteristics are similar to those of the dense and sparse cell arrays, however the execution time is less sensitive to cell size. The test with a cell size of 0.01414 has the best overall performance.
Log-log plot of execution time versus query size for the cell array with binary searching on the randomly distributed points in a cube problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the cell methods in the figures below. For small queries, the execution times of the three methods are very close. For large queries, the dense cell array has a little better performance. Thus dense cell arrays give the best overall performance for this test.
Log-log plot of execution time versus query size for the cell methods on the randomly distributed points in a cube problem. The key indicates the data structure. We show the dense cell array with a cell size of 0.02, the sparse cell array with a cell size of 0.02 and the cell array with binary searching with a cell size of 0.01414. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the orthogonal range query methods in the figures below. We plot the execution times of the best performers from each family of methods. The projection method has relatively high execution times, especially for small query sizes. The kd-tree without domain checking has good performance for small queries. The kd-tree with domain checking performs well for large queries and has pretty good execution times for small queries as well. The dense cell array method has lower execution times than each of the other methods for all query sizes. It gives the best overall performance for this test.
Log-log plot of execution time versus query size for the orthogonal range query methods on the randomly distributed points in a cube problem. The key indicates the data structure. We show the sequential scan method, the projection method, the kd-tree with a leaf size of 8, the kd-tree with domain checking with a leaf size of 8 and the cell array with a cell size of 0.02.
The execution time per reported record.
In the test of the previous section the records were distributed throughout the 3-D domain. Now we do a test in which the records lie on a 2-D surface. The records for this test are uniform randomly distributed points on the surface of a sphere with unit radius. The query sizes are and range in size from about 0.001 to 2. The query ranges are centered about points on the surface of the sphere. The smallest query range contains about a single record on average. The largest query range contains about 40% of the records. See the figure below.
Log-log plot of the average number of records in the query versus the query size for the randomly distributed points on a sphere problem.
The figure below shows the performance of the sequential scan method. As before, the performance is roughly constant for small and medium sized queries but is higher for large query sizes.
Log-log plot of execution time versus query size for the sequential scan method on the randomly distributed points on a sphere problem.
The performance of the projection method and the point-in-box method are shown in the figure below. Again the projection method has slightly lower execution times than the point-in-box method.
Log-log plot of execution time versus query size for the projection method and the point-in-box method on the randomly distributed points on a sphere problem. The performance of the sequential scan method is shown for comparison.
The figures below show the performance of the kd-tree data structure. The execution time is moderately sensitive to the leaf size for small queries and mildly sensitive for large queries. As before, small leaf sizes give better performance for small queries and large leaf sizes give better performance for large queries. The test with a leaf size of 8 has the best overall performance.
Log-log plot of execution time versus query size for the kd-tree without domain checking data structure on the randomly distributed points on a sphere problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the kd-tree data structure with domain checking. The performance characteristics are similar to the kd-tree without domain checking. The test with a leaf size of 16 has the best overall performance.
Log-log plot of execution time versus query size for the kd-tree with domain checking data structure on the randomly distributed points on a sphere problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the octree data structure. The test with a leaf size of 16 has the best overall performance.
Log-log plot of execution time versus query size for the octree data structure on the randomly distributed points on a sphere problem. The key shows the leaf size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the tree methods in the figures below. For small queries, the kd-tree method without domain checking gives the best performance. For large queries, domain checking becomes profitable. For medium sized queries, the different methods perform similarly. The kd-tree data structure without domain checking during the query appears to give the best overall performance.
Log-log plot of execution time versus query size for the tree methods on the randomly distributed points on a sphere problem. The key indicates the data structure. We show the kd-tree with a leaf size of 8, the kd-tree with domain checking with a leaf size of 16 and the octree with a leaf size of 16. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the cell array data structure. The execution time is highly sensitive to the cell size. Small cell sizes give good performance for small queries. Medium to large cell sizes give good performance for large queries. The test with a cell size of 0.02 has the best overall performance.
Log-log plot of execution time versus query size for the cell array data structure on the randomly distributed points on a sphere problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of the sparse cell array data structure. The performance is similar to that of the dense cell array for small queries. The execution time is less sensitive to cell size for large queries. In fact, the performance hardly varies with cell size. The test with a cell size of 0.02 has the best overall performance.
Log-log plot of execution time versus query size for the sparse cell array data structure on the randomly distributed points on a sphere problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
The figures below show the performance of using a cell array with binary searching. The execution time is moderately sensitive to cell size for small query sizes and mildly sensitive for large queries. The test with a cell size of 0.02 has the best overall performance.
Log-log plot of execution time versus query size for the cell array with binary searching data structure on the randomly distributed points on a sphere problem. The key shows the cell size. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the cell methods in the figures below. Each method has a cell size of 0.02. For small queries, the cell array with binary searching has the best performance. For large queries, the dense cell array method is a little faster.
Log-log plot of execution time versus query size for the cell methods on the randomly distributed points on a sphere problem. The key indicates the data structure. We show the dense cell array, the sparse cell array and the cell array with binary searching, each with a cell size of 0.02. The performance of the sequential scan method is shown for comparison.
The execution time per reported record.
We compare the performance of the orthogonal range query methods in the figures below. We plot the execution times of the best overall performers from each family of methods. The projection method has relatively high execution times, especially for small query sizes. The kd-tree without domain checking has good performance for small queries. There is a performance penalty for domain checking for small queries and a performance boost for large queries. The cell array with binary searching performs better than the dense cell array for small queries. This is because the size of a cell in the dense cell array is much larger than the query size so time is wasted with inclusion tests. The dense cell array has the edge for large queries because the query range spans many cells.
For small query sizes, the cell array with binary searching and the kd-tree without domain checking both perform well. For large query sizes the dense cell array performs best, followed by the cell array with binary searching. The cell array with binary searching has the best overall performance for this test.
Log-log plot of execution time versus query size for the orthogonal range query methods on the randomly distributed points on a sphere problem. The key indicates the data structure. We show the sequential scan method, the projection method, the kd-tree with a leaf size of 8, the kd-tree with domain checking with a leaf size of 16, the cell array with a cell size of 0.02 and the cell array with binary searching with a cell size of 0.02.
The execution time per reported record.