Performance Tests for Multiple Queries over a Range of File Sizes

Points on the Surface of a Chair

We consider the chair data set, introduced previously. By refining the surface mesh of the chair, we vary the number of records from 1,782 to 1,864,200. There is unit spacing between adjacent records. We perform a cubical orthogonal range query of size 4 around each record. The table below shows the total number of records returned for the six tests.

The next table shows the execution times for the chair problems. The leaf sizes and cell sizes are chosen to minimize the execution time. The memory usage is shown in the following table. An entry of "o.t." indicates that the test exceeded the time limit. An entry of "o.m." means out of memory. (Note that the kd-tree method with domain checking has the same memory usage as the kd-tree method without domain checking as the data structure is the same.)

# records 1,782 7,200 28,968 116,232 465,672 1,864,200
# returned 65,412 265,104 864,296 3,192,056 12,220,376 47,768,216

The total number of records returned by the orthogonal range queries for the chair problems.

# records 1,782 7,200 28,968 116,232 465,672 1,864,200
seq. scan 0.195 3.453 98.72 o.t. o.t. o.t.
projection 0.061 0.480 3.86 33.01 322.0 o.t.
pt-in-box 0.045 0.366 2.94 25.78 205.9 o.t.
kd-tree 0.081 0.383 1.94 9.58 46.7 o.t.
kd-tree d. 0.101 0.475 2.50 12.66 63.0 o.t.
octree 0.035 0.164 0.78 3.12 13.4 56
cell 0.024 0.102 0.37 1.41 5.6 o.m.
sparse cell 0.025 0.108 0.40 1.50 5.9 25
cell b. s. 0.028 0.121 0.49 1.89 8.1 34
cell f. s. 0.019 0.081 0.31 1.24 5.0 21
cell f. s. k. 0.013 0.055 0.23 0.93 3.8 17

The total execution time for the orthogonal range queries for the chair problem with a query size of 4.

# records 1,782 7,200 28,968 116,232 465,672 1,864,200
seq. scan 7,140 28,812 115,884 o.t. o.t. o.t.
projection 21,420 86,436 347,652 1,394,820 5,588,100 o.t.
pt-in-box 49,980 201,684 811,188 3,254,580 13,038,900 o.t.
kd-tree 17,416 69,808 279,760 1,120,336 4,484,176 o.t.
octree 62,708 272,212 1,160,492 4,520,452 17,979,204 72,681,460
cell 32,328 206,412 1,446,540 10,760,556 82,850,988 o.m.
sparse cell 14,360 63,088 252,976 1,013,680 4,058,800 16,243,888
cell b. s. 8,836 34,684 137,884 550,300 2,199,196 8,793,244
cell f. s. 16,764 66,372 264,708 1,057,860 4,230,084 16,918,212
cell f. s. k. 63,132 252,168 1,009,224 4,039,272 16,163,112 64,665,768

The memory usage of the data structures for the chair problem.

CompareFileSizeChairTime.jpg

Log-log plots of the execution times versus the number of reported records and the memory usage versus the number of records in the file for each of the orthogonal range query methods on the chair problems. The execution time is shown in microseconds per returned record.

CompareFileSizeChairMemory.jpg

The memory usage is shown in bytes per record.

The figures above show the execution times and memory usage for the various methods. First consider the tree methods. The octree has significantly lower execution times than the kd-tree methods. It performs the orthogonal range queries in less than half the time of the kd-tree methods. This is because the records are regularly spaced. Having regularly spaced records avoids the primary problem with using octrees: the octree may be much deeper than a kd-tree. However, memory usage is a different story. The octree uses about four times the memory of the kd-tree methods.

Next consider the cell methods. The dense cell array has reasonable execution times, but the the memory usage increases rapidly with the problem size. Recall that the cell size is chosen to minimize execution time. Following this criterion, the dense cell array runs out of memory for the largest test case. The execution times of the sparse cell array are close to those of the dense cell array. However, it uses much less memory. Like the rest of the cell methods, the memory usage is proportional to the problem size. The cell array with binary searches has higher execution times than the other cell methods but has the lowest memory requirements. The cell array with forward searching has lower execution times than than above methods. Its memory usage is about the same as the sparse cell array. Finally, storing the keys with the cell array coupled with forward searching gives the lowest execution times at the price of a higher memory overhead.

Among the tree methods, the octree offered the lowest execution times. For cell methods, the sparse cell array and the cell array with forward searching have good performance. These two cell methods perform significantly better than the octree. The cell methods do the queries in about half the time of the octree method and use one quarter of the memory. The cell array with forward searching has the lower execution times of the two cell methods.

Randomly Distributed Points in a Cube

We consider the random points data set. The number of records varies from 100 to 1,000,000. We perform cubical orthogonal range queries around each record. The query size is chosen to contain an average of about 10 records. The table below shows the total number of records returned for the five tests.

The next table shows the execution times for the chair problems. Again, the leaf sizes and cell sizes are chosen to minimize the execution time. The memory usage is shown in the following table. An entry of "o.t." indicates that the test exceeded the time limit. (The kd-tree method with domain checking has the same memory usage as the kd-tree method without domain checking as the data structure is the same.)

# records 100 1,000 10,000 100,000 1,000,000
# returned 886 9,382 102,836 1,063,446 10,842,624

The total number of records returned by the orthogonal range queries for the random points problems.

# records 100 1,000 10,000 100,000 1,000,000
seq. scan 0.00071 0.0683 8.173 o.t. o.t.
projection 0.00077 0.0281 1.323 104.15 o.t.
pt-in-box 0.00061 0.0254 1.304 112.61 o.t.
kd-tree 0.00095 0.0154 0.205 3.56 44
kd-tree d. 0.00114 0.0187 0.268 4.31 52
octree 0.00079 0.0212 0.363 7.08 92
cell 0.00091 0.0135 0.173 3.03 36
sparse cell 0.00101 0.0146 0.201 3.24 39
cell b. s. 0.00088 0.0109 0.140 1.74 27
cell f. s. 0.00068 0.0082 0.109 1.15 16
cell f. s. k. 0.00050 0.0054 0.067 0.77 11

The total execution time for the orthogonal range queries for the random points problem.

# records 100 1,000 10,000 100,000 1,000,000
seq. scan 412 4,012 40,012 o.t. o.t.
projection 1,236 12,036 120,036 1,200,036 o.t.
pt-in-box 2,884 28,084 280,084 2,800,084 o.t.
kd-tree 1,088 9,168 121,968 1,055,408 9,242,928
octree 4,628 46,280 398,488 3,425,956 30,199,580
cell 724 5,500 52,000 527,776 5,245,876
sparse cell 928 6,400 57,600 578,112 5,696,368
cell b. s. 652 4,508 41,708 407,852 4,035,452
cell f. s. 1,124 8,708 82,508 811,724 8,053,124
cell f. s. k. 3,848 33,608 326,108 3,229,148 32,132,648

The memory usage of the data structures for the random points problem.

CompareFileSizeRandomTime.jpg

Log-log plots of the execution times versus the number of reported records and the memory usage versus the number of records in the file for each of the orthogonal range query methods on the random points problems. The execution time is shown in microseconds per returned record.

CompareFileSizeRandomMemory.jpg

The memory usage is shown in bytes per record.

The figures above show the execution times and memory usage for the various methods. First consider the tree methods. The kd-tree methods have lower execution times than the octree method. This result differs from the chair problems, because now the records are not regularly spaced. Where records are close to each other, the octree does more subdivision. Because of the small query size, it is advantageous to not do domain checking in the kd-tree algorithm. Finally, we note that the kd-tree methods use about a third of the memory of the octree.

Next consider the cell methods. The dense and sparse cell arrays have the highest execution times, but they have fairly low memory requirements. Since the records are distributed throughout the domain, there are few empty cells, and there is nothing to be gained by using a sparse array over a dense array. However, the penalty for using the sparse array (in terms of increased cell access time and increased memory usage) is small. The cell array with binary searching outperforms the dense and sparse cell arrays. It has lower execution times and uses less memory. The execution times of the cell array with forward searching are lower still. However, storing the sorted queries increases the memory usage. Finally, by storing the keys one can obtain the best execution times at the price of higher memory usage.

Among the tree methods, the kd-tree without domain checking offered the best performance. For cell methods, the cell array with binary searching and the cell array with forward searching have low execution times. These two cell methods outperform the kd-tree method. The cell array with binary searching does the queries in about half the time of the kd-tree while using less than the half the memory. The cell array with forward searching has significantly lower execution times than the binary search method, but uses about twice the memory.

Generated on Thu Jun 30 02:14:58 2016 for Computational Geometry Package by  doxygen 1.6.3