Large-Scale Test Data Set for Location Problems
Matej Cebecauer ^{a} and Lubos Buzna^{b,c}
^{a}Department of Transport Science, KTH Royal Institute of Technology, Teknikringen 10, SE‑100 44 Stockholm, Sweden
^{b}Department of Mathematical Methods and Operations Research, University of Zilina, Univerzitna 8215/1, SK‑010 26 Zilina, Slovakia
^{c}ERA Chair for Intelligent Transport Systems, University of Zilina, Univerzitna 8215/1, SK‑010 26 Zilina, Slovakia
Contact email: matejc@kth.se
Abstract
Designers of location algorithms share test data sets (benchmarks) to be able to compare performance of newly developed algorithms. In previous decades, the availability of locational data was limited. Big data has revolutionised the amount and detail of information available about human activities and the environment. It is expected that integration of big data into location analysis will increase the resolution and precision of input data. Consequently, the size of solved problems will significantly increase the demand on the development of algorithms that will be able to solve such problems. Accessibility of realistic large scale test data sets, with the number of demands points above 100 000, is very limited. The presented data set covers entire area of Slovakia and consists of the graph of the road network and almost 700 000 connected demand points. The population of 5.5 million inhabitants is allocated to the locations of demand points considering the residential population grid to estimate the size of the demand. The resolution of demand point locations is 100 metres. With this article the test data is made publicly available to enable other researches to investigate their algorithms. The second area of its utilisation is the design of methods to eliminate aggregation errors that are usually present when considering location problems of such size. The data set is related to two research articles: A Versatile Adaptive Aggregation Framework for Spatially Large Discrete Location-Allocation Problem [1] and Effects of demand estimates on the evaluation and optimality of service centre locations [2].
Specifications Table
Subject area |
applied mathematics, operations research, discrete optimization |
More specific subject area |
location analysis, geographic information systems |
Type of data |
graph of the road network, weighted demand points derived from GIS data and residential population grid |
How data was acquired |
Data set was created by combing publicly available data sets such as OpenStreetMap and residential population grid. |
Data format |
csv text files, shapefiles |
Data source location |
Slovakia (Longitude 17.001 - 22.110, Latitude 47.732 - 49.586) |
Data accessibility |
Data
is published together with the article. Moreover, data is
published on the professional web page of one of the
co-authors: |
Value of the Data
Data set can be used as a benchmark to design and experiment with new location algorithms intended to solve large-scale locational and spatial problems.
Data set is applicable in the design and studies of new aggregation methods to minimise the impact of aggregation errors on the outcome of optimisation.
Data set can be used to derive large number of medium and small size benchmarks by selecting specific geographic areas.
Data set enables visualisation of results of optimisation algorithms in GIS.
1 Data
Central component of the benchmark Slovakia is the graph consisting of 1 956 067 georeferenced nodes further defining 2 080 694 edges representing the road sections covering the entire area of Slovakia. 663 203 of these nodes identify the potential population demand distribution derived from the residential population density. In the literature it is common to refer to these points as to demand points (DPs). A potential demand is located in the populated area approximately each 100 meters and connected to the road network (see Figure 1 for illustration).