We've developed a new graphical loading tool for OS MasterMap data focussing on usability and performance to make it easy to load national Ordnance Survey MasterMap datasets in a matter of hours.
This blog post talks about some simple benchmarks we've carried out.
If you are interested in using this tool and not familiar with Postgresql/PostGIS, you can sign up to one of our support packages and we will be able to set you up and running within a couple of hours!
National load times were as follows:
- MasterMap Topography (National) 20 hrs 21 mins 1
- MasterMap ITN (National) < 6 mins
Installing PostgreSQL, PostGIS and QGIS took less than 10 minutes.
1 This is the most time-consuming test which filled the SSD on the first attempt. Importing to a tablespace on the main HDD completed after 20.3 hours but showed the import of tile 1592959-TR0585-5c3268.gz to have failed with this error. Until this issue is resolved the tile would need to be loaded and de-duplicated manually (e.g. using ogr2ogr to import and a SQL query to de-duplicate) to complete the dataset. De-duplication removes duplicate features caused by the chunking / supply process.
Comparison With Other Open Source Tools
We were curious as to how OS Translator II load times compared with other open loading methods so we did some basic tests using the "SU" tile of MasterMap Topography and ITN datasets and compared it with the popular Loader scripts. The results looked like this:
Please note that OS Translator II had an unfair advantage in these tests as it automatically takes advantage of multiple-CPU cores whereas Loader presently does not.
Hardware and Software
We used the following hardware and software configuration:
- CPU Intel Core i7 4790K (Haswell) @ 4GHz
- Memory 32GB PC3-12800
- Disk(s) Samsung 840 EVO 250GB SSD and Seagate Barracuda ST2000DM001 2TB HDD2
- OS Microsoft Windows 7 Professional (64-bit)
- PostgreSQL 9.4.1 (x64)
- PostGIS 2.1.5 (x64)
- QGIS 2.6.1 (Brighton)
- OS Translator II 1.0
- Python 2.7.9 (win32)
- lxml 3.2.3 (win32)
- Loader Master (067a511313, 20th February 2014)
2 Operating system and source gml.gz files located on the SSD and default PostgreSQL tablespace stored on secondary 2TB HDD.
The following changes were made to the default PostgreSQL configuration:
- shared_buffers 512MB
- work_mem 16MB
- maintenance_work_mem 128MB / 1024MB3
- checkpoint_segments 6
- random_page_cost 2.0
- fsync off
3 maintenance_work_mem was set to 1024MB for the national load of MasterMap Topography layer only.
Turning fsync off is dangerous and can lead to data loss in the event of an unexpected power outage. Always switch fsync back on after loading and never use this option on a database containing critical data.