Performance Tests for Multiple Queries over a Range of File Sizes

Points on the Surface of a Chair

We consider the chair data set, introduced previously. By refining the surface mesh of the chair, we vary the number of records from 1,782 to 1,864,200. There is unit spacing between adjacent records. We perform a cubical orthogonal range query of size 4 around each record. The table below shows the total number of records returned for the six tests.

The next table shows the execution times for the chair problems. The leaf sizes and cell sizes are chosen to minimize the execution time. The memory usage is shown in the following table. An entry of "o.t." indicates that the test exceeded the time limit. An entry of "o.m." means out of memory. (Note that the kd-tree method with domain checking has the same memory usage as the kd-tree method without domain checking as the data structure is the same.)

# records	1,782	7,200	28,968	116,232	465,672	1,864,200
# returned	65,412	265,104	864,296	3,192,056	12,220,376	47,768,216

The total number of records returned by the orthogonal range queries for the chair problems.

# records	1,782	7,200	28,968	116,232	465,672	1,864,200
seq. scan	0.195	3.453	98.72	o.t.	o.t.	o.t.
projection	0.061	0.480	3.86	33.01	322.0	o.t.
pt-in-box	0.045	0.366	2.94	25.78	205.9	o.t.
kd-tree	0.081	0.383	1.94	9.58	46.7	o.t.
kd-tree d.	0.101	0.475	2.50	12.66	63.0	o.t.
octree	0.035	0.164	0.78	3.12	13.4	56
cell	0.024	0.102	0.37	1.41	5.6	o.m.
sparse cell	0.025	0.108	0.40	1.50	5.9	25
cell b. s.	0.028	0.121	0.49	1.89	8.1	34
cell f. s.	0.019	0.081	0.31	1.24	5.0	21
cell f. s. k.	0.013	0.055	0.23	0.93	3.8	17

The total execution time for the orthogonal range queries for the chair problem with a query size of 4.

# records	1,782	7,200	28,968	116,232	465,672	1,864,200
seq. scan	7,140	28,812	115,884	o.t.	o.t.	o.t.
projection	21,420	86,436	347,652	1,394,820	5,588,100	o.t.
pt-in-box	49,980	201,684	811,188	3,254,580	13,038,900	o.t.
kd-tree	17,416	69,808	279,760	1,120,336	4,484,176	o.t.
octree	62,708	272,212	1,160,492	4,520,452	17,979,204	72,681,460
cell	32,328	206,412	1,446,540	10,760,556	82,850,988	o.m.
sparse cell	14,360	63,088	252,976	1,013,680	4,058,800	16,243,888
cell b. s.	8,836	34,684	137,884	550,300	2,199,196	8,793,244
cell f. s.	16,764	66,372	264,708	1,057,860	4,230,084	16,918,212
cell f. s. k.	63,132	252,168	1,009,224	4,039,272	16,163,112	64,665,768

The memory usage of the data structures for the chair problem.

Log-log plots of the execution times versus the number of reported records and the memory usage versus the number of records in the file for each of the orthogonal range query methods on the chair problems. The execution time is shown in microseconds per returned record.

The memory usage is shown in bytes per record.

The figures above show the execution times and memory usage for the various methods. First consider the tree methods. The octree has significantly lower execution times than the kd-tree methods. It performs the orthogonal range queries in less than half the time of the kd-tree methods. This is because the records are regularly spaced. Having regularly spaced records avoids the primary problem with using octrees: the octree may be much deeper than a kd-tree. However, memory usage is a different story. The octree uses about four times the memory of the kd-tree methods.

Next consider the cell methods. The dense cell array has reasonable execution times, but the the memory usage increases rapidly with the problem size. Recall that the cell size is chosen to minimize execution time. Following this criterion, the dense cell array runs out of memory for the largest test case. The execution times of the sparse cell array are close to those of the dense cell array. However, it uses much less memory. Like the rest of the cell methods, the memory usage is proportional to the problem size. The cell array with binary searches has higher execution times than the other cell methods but has the lowest memory requirements. The cell array with forward searching has lower execution times than than above methods. Its memory usage is about the same as the sparse cell array. Finally, storing the keys with the cell array coupled with forward searching gives the lowest execution times at the price of a higher memory overhead.

Among the tree methods, the octree offered the lowest execution times. For cell methods, the sparse cell array and the cell array with forward searching have good performance. These two cell methods perform significantly better than the octree. The cell methods do the queries in about half the time of the octree method and use one quarter of the memory. The cell array with forward searching has the lower execution times of the two cell methods.

Randomly Distributed Points in a Cube

We consider the random points data set. The number of records varies from 100 to 1,000,000. We perform cubical orthogonal range queries around each record. The query size is chosen to contain an average of about 10 records. The table below shows the total number of records returned for the five tests.

The next table shows the execution times for the chair problems. Again, the leaf sizes and cell sizes are chosen to minimize the execution time. The memory usage is shown in the following table. An entry of "o.t." indicates that the test exceeded the time limit. (The kd-tree method with domain checking has the same memory usage as the kd-tree method without domain checking as the data structure is the same.)

# records	100	1,000	10,000	100,000	1,000,000
# returned	886	9,382	102,836	1,063,446	10,842,624

The total number of records returned by the orthogonal range queries for the random points problems.

# records	100	1,000	10,000	100,000	1,000,000
seq. scan	0.00071	0.0683	8.173	o.t.	o.t.
projection	0.00077	0.0281	1.323	104.15	o.t.
pt-in-box	0.00061	0.0254	1.304	112.61	o.t.
kd-tree	0.00095	0.0154	0.205	3.56	44
kd-tree d.	0.00114	0.0187	0.268	4.31	52
octree	0.00079	0.0212	0.363	7.08	92
cell	0.00091	0.0135	0.173	3.03	36
sparse cell	0.00101	0.0146	0.201	3.24	39
cell b. s.	0.00088	0.0109	0.140	1.74	27
cell f. s.	0.00068	0.0082	0.109	1.15	16
cell f. s. k.	0.00050	0.0054	0.067	0.77	11

The total execution time for the orthogonal range queries for the random points problem.

# records	100	1,000	10,000	100,000	1,000,000
seq. scan	412	4,012	40,012	o.t.	o.t.
projection	1,236	12,036	120,036	1,200,036	o.t.
pt-in-box	2,884	28,084	280,084	2,800,084	o.t.
kd-tree	1,088	9,168	121,968	1,055,408	9,242,928
octree	4,628	46,280	398,488	3,425,956	30,199,580
cell	724	5,500	52,000	527,776	5,245,876
sparse cell	928	6,400	57,600	578,112	5,696,368
cell b. s.	652	4,508	41,708	407,852	4,035,452
cell f. s.	1,124	8,708	82,508	811,724	8,053,124
cell f. s. k.	3,848	33,608	326,108	3,229,148	32,132,648

The memory usage of the data structures for the random points problem.

Log-log plots of the execution times versus the number of reported records and the memory usage versus the number of records in the file for each of the orthogonal range query methods on the random points problems. The execution time is shown in microseconds per returned record.

The memory usage is shown in bytes per record.

The figures above show the execution times and memory usage for the various methods. First consider the tree methods. The kd-tree methods have lower execution times than the octree method. This result differs from the chair problems, because now the records are not regularly spaced. Where records are close to each other, the octree does more subdivision. Because of the small query size, it is advantageous to not do domain checking in the kd-tree algorithm. Finally, we note that the kd-tree methods use about a third of the memory of the octree.

Next consider the cell methods. The dense and sparse cell arrays have the highest execution times, but they have fairly low memory requirements. Since the records are distributed throughout the domain, there are few empty cells, and there is nothing to be gained by using a sparse array over a dense array. However, the penalty for using the sparse array (in terms of increased cell access time and increased memory usage) is small. The cell array with binary searching outperforms the dense and sparse cell arrays. It has lower execution times and uses less memory. The execution times of the cell array with forward searching are lower still. However, storing the sorted queries increases the memory usage. Finally, by storing the keys one can obtain the best execution times at the price of higher memory usage.

Among the tree methods, the kd-tree without domain checking offered the best performance. For cell methods, the cell array with binary searching and the cell array with forward searching have low execution times. These two cell methods outperform the kd-tree method. The cell array with binary searching does the queries in about half the time of the kd-tree while using less than the half the memory. The cell array with forward searching has significantly lower execution times than the binary search method, but uses about twice the memory.

Generated on Thu Jun 30 02:14:58 2016 for Computational Geometry Package by 1.6.3