Thursday 11 August 2011

Creating off-line OpenStreetMap tiles along a route

Generation of mapnik tiles for a route

Gordonjcp was wanting to have a set of offline tiles for his 'phone along a route he is taking tomorrow. The ability to create a limited set of tiles would get round some of the problems associated with tile scraping, limited storage space, and time taken to create the tiles.

However, there aren't any obvious apps which do this. It occurred to me that it might be possible to limit this on the SQL side with little or no change to the standard generate_tiles.py script which is distributed with the OpenStreetMap mapnik renderer.

The idea was so simple it was worth trying out. I already had a PostGIS database containing (fairly) recent data for Great Britain, so all I had to do was import a route into the same database and create views which limited the OSM data to areas adjacent to the route. This is a very simple-minded hack, but it seems to work reasonably well and it may be possible to improve it by adding a little code to the existing python script.

For a route I chose a GPX track I uploaded recently. There are plenty of other ways of generating a similar track: notably by using an on-line routing engine and saving the route as a GPX (for instance Open MapQuest, OpenRouteService or Cloudmade). To get the GPX track into PostGIS I used Quantum GIS to save the track as a shapefile and the SPIT extension to upload this to PostGIS. In PostGIS I added two extra geometry columns to this one row table, and updated these with the track itself and the track surrounded by a 1 km buffer:

UPDATE route_track
SET geom_goog = ST_SETSRID(ST_TRANSFORM(geom_wgs,900913),900913)
, geom_goog_buffer = ST_SETSRID(ST_BUFFER(
ST_TRANSFORM(geom_wgs,900913),1000),900913);
For each of the 4 main planet_osm_* tables I created a view as follows:
CREATE OR REPLACE VIEW planet_osm_lineX
AS
SELECT a.*
FROM planet_osm_line a
, route_track b
WHERE ST_INTERSECTS(a.way,b.geom_goog_buff)
I ran SELECT populate_geometry_columns() to ensure the geometry columns were accessible to PostGIS operations. It would be possible to rename the planet_osm_* tables and create the views instead of them. This would avoid the need to create an extra copy of the mapnik XML file, which is what I did next.

I made a copy of the mapnik.xml file and replaced all examples of planet_osm_* with the same string with an "X" appended.

All I had to do then was add a suitable bounding box to the python script and set local variables for the map tile directory & the mapnik map file. Really the bbox should be determined from the database, but the objective was to do something quick and dirty. For my example I used a bounding box which was 2 degrees wide and 1 degree high.

Tile generation runs pretty smoothly: as one might expect lots of empty tiles are created, but the query overhead is quite reasonable. As the empty tiles are only 103 bytes each they don't take up too much space either.

There are lots of fairly simple optimisations which could be added:
  • Find the bbox of the route & only generate tiles within that bbox.
  • Identify empty tiles en masse and avoid running the mapnik queries against the database for these tiles.
  • Replace empty tiles with links.




Monday 8 August 2011

Augmenting Residential Landuse : building density

Building Density on Residential polygons, Nottingham

Andy Allan (gravitystorm) asked a very pertinent question after my talk at SotMEU: Why had I not used building outlines to automatically identify different classes of residential landuse?

Briefly, my answer was: "I had tried, but the results weren't satisfactory". This is a much longer answer than I could give at Vienna, and fills in some detail about the technical approach I used. I hope that readers may have some good suggestions to improve this approach.

It is clear that Urban Atlas data is a good model for studying landuse mapping through replication/simulation in OSM. After working with this data for six months I realise that one of the main reasons for this is that UA's main classification is simpler than the current OSM consensus for landuse mapping. Thus most UA landuse classes have a 1:M mapping to OSM tags. Other landuse and landcover schemes, such as CORINE, EUNIS, have many more classes and in thus in many cases the classes have N:1 or N:M mappings to OSM tags. Most of these relate to agriculture, woodland or natural vegetation: for the most part where OSM tag usage is only fitfully consistent, or the same tag has multiple meanings.

There are, however, 5 UA classes (Continuous and Discontinuous Urban Cover — 11100, 11210,11220, 11230 and 11240) which map on to a single tag in OSM : landuse=residential. The main criterion for separating this classes in UA is the degree of surface sealing, with continuous urban cover meaning that over 80% of the land surface is effectively sealed (buildings, asphalt, concrete etc.). As a significant proportion of surface sealing will be buildings it seems plausible to use the proportion of residential landuse covered by buildings as a proxy for surface sealing.

Buildings from OSM & OS OpenData
Buildings generated from OS OpenData using mapseg (orange) compared with those mapped in OSM using Bing imagery (light green) for OS grid square SK5038.

Firstly, I needed a decent source for building outlines. My test data area, Nottingham, does have a large number of mapped buildings (47000+), but is nothing like complete. Partially mapped building sets are no use at all for this purpose. I therefore made use of Mapseg a set of python routines written by Tim Sheeman-Chase to extract building outlines from the Ordnance Survey OpenData StreetView (OSSV) tiles. I had originally created this data to make address mapping easier, but was dis-satisfied with artefacts (spiky buildings, failed orthogonalisation, etc.) so I had only used a tiny proportion of the data directly in OSM. My original data set covered the whole of the county of Nottingham and adjacent areas and included over 600000 building outlines: processing and manipulating this was somewhat tedious as each OSSV tile or sub-tile generated a single OSM XML file and I therefore had around 200 files with duplicate node and way identifiers. In the end I loaded each file into PostGIS and merged the data there. I then clipped it to the bounds of the target test area to end up with a data set of 189, 420 buildings. Some geometries needed to be cleaned to remove self-intersections.

Buildings were then clipped to match the residential landuse parcels, and then assigned to individual parcels. Landuse parcels which had originally been created as multipolygons on a 1 km grid were decomposed to individual polygons (using ST_DUMP) before this step. Polygons meeting on grid boundaries were not re-assembled. At this point it is simple to sum the building area for each polygon and determine the ratio of the total polygon area covered by buildings. The results are graphed below with the data divided into 20 buckets.

For comparison here is the relative distribution of the EEA UA data for the Nottingham area:

UA Code Area (ha) % Total Residential
11100 137 1.0%
11210 7132 52.0%
11220 5298 38.7%
11230 846 6.2%
11240 18 0.1%
11300 272 2.0%

The vast majority of residential landuse falls in two classes with surface sealing in the range 30 to 80%, with more than half of the landuse with greater than 50% surface sealing. A cursory comparison between the two sets of figures suggests that, at best building cover may only account for half factors used to assign surface sealing in Urban Atlas. Fortunately because I created a datasets of differences and similarities of data from UA and OSM it's possible to compare individual landuse polygons: a bit more work as these may be slightly different shapes, and therefore the whole clipping of buildings and area calculation must be repeated.

Comparison of UA Codes with OSM building density

The box-plot above amply confirms the initial impression. No doubt if and when I learn more about R, a proper statistical comparison could be done. Building density from OSM does not explain enough of the UA surface sealing value to be reliable for classification. At least for now. There is enough in the plot to hint that enrichment of the data with other variables may facilitate separation, but that's a different project again.

So that's the (very) long answer to Andy's question.

What does it mean for OSM:
  • Using buildings to derive sub-classes of residential landuse is not generally useful.
  • The requirement for completeness in building outlines on its own means that such an approach would at the best of times only work in certain places (good external sources of building data, excellent imagery). It may also require fanatical devotion to mapping buildings.
  • The basic classes of landuse we are interested can probably be derived from local knowledge, ground survey, aerial imagery etc. To make this more generally useful this requires some guidelines and consensus on tagging.
  • For now we could use tags similar to the Urban Atlas codes for urban areas in Europe and North America. Significant extension of these values might be needed to adequately cope with other areas in the world. (I'll be blogging in depth on this subject soon).
  • Looking at extending this type of data with other OSM derived variables may be interesting. Alexander Zipf mentioned that his group at Heidelberg had used a data mining approach to identify residential areas from OSM data.
If you've got any ideas of how to extend this type of classification drop me an email or comment below.

Tuesday 26 July 2011

Footpaths in Carmarthenshire: What's the point?

IMG_6566a

Late in June I spent nearly a week near Carmarthen. I thought I'd get to know some footpaths local to where I was staying.


Monday 20 June 2011

The Church which turned into a Gurdwara: pre-pub mapping

Church? 4827a

This is a very belated write-up of the May Nottingham pub meet-up.

First of all it was great to meet will_p, our most productive local mapper, and to welcome Paul the Archivist who has mapped actively in his home town Derby, Birmingham and Cardiff. A few of us met at the pub for an hour walk and a bit of light mapping: this is always a good opportunity to compare notes about how different people map and talk about specific issues as they arise (such as how does one represent the surface of the road below: we didn't have a good answer!).

surface=? 4804a

I find it difficult to think of an area comparable to that around the Lincolnshire Poacher which has a similar variety of different sorts of places within a couple of hundred metres: modern social housing, schools, a university campus, retail and commercial areas, pre 20th century housing, large houses, parks, gardens etc. It's also not that well-mapped, something which only really becomes apparent by walking through and checking OSM with what's on the ground. In fact our little mapping walk showed, once again, how much can be missed when one only uses aerial imagery.

We again chased down a few missing road names, but failed to see any signs on one road. Marcus pointed out a few streets along Woodthorpe Road which were blocked and not mapped properly (these are tree-lined so guessing that they connected from aerial imagery was wrong).

This time we headed into the S edge of Mapperley Park, one of the wealthier parts of the city, and a great contrast from St Anns, just the other side of the ridge line. We only followed three or four roads but it was very clear that the original sources of OSM data were pretty poor: this was a mix of NPE maps and aerial imagery. Density of mature trees in Mapperley Park made it difficult to identify the actual road lines from aerial imagery. We walked down Redcliffe Road the upper parts deeply cut into the rock and supported by Bulwell Stone walls . These stone walls area a classic sign locally of a private development between about 1880 and 1960. After 1960 the quarries were running short of this attractive Magnesian Limestone. Many are now used for housing or industry close to Bulwell town centre.

Old Boundary Marker : 4802aThe most interesting site along this road were a couple of cast-iron boundary markers from 1859: these list the mayor and town clerk of the Town of Nottingham (it didn't become a city until some time later). In style they are very similar to the markers used when the City boundaries were extended in the 1930s (see an earlier post). I was intrigued because I didn't know which area was on the other side of the boundary. Fortunately we have an old map at home (Salmon's map) which is virtually contemporaneous with these boundary markers. I took a couple of pictures of the same area that we walked through showing the state of Nottingham in the early 1860s. This clearly shows the boundary, and that Mapperley Park and much of the Forest were in the parish of Basford. Basford was incorporated into Nottingham in the 1880s.

Nottingham boundary 1861 : 4947b

Also of interest is that the road was then called Red Hill Lane. Some road names in St Anns have been preserved although in altered form after re-development, but quite a few on the roads around Bellevue Reservoir seem to have been lost. It's not clear what name the paths round the reservoir have themselves: this old map calls it St. Anns Hill, which now belongs to a road which was then called Blenheim Villas, whereas modern maps (and some from the 1930s) name it as part of Corporation Oaks.

Gurdwara Noticeboard: 4832a

Returning to the pub along Mansfield Road we had a better look at the prominent church shown at the top of the blog. Paul had noted that it seemed to be a Gurdwara, a Sikh Temple, and as we passed it took a 'photo of the notice board. The following day I felt compelled to do a bit of research: How long had it been a Gurdwara? Who built it? what was it's recent history? I was relieved to discover that it had recently been acquired from the Christian Scientists who have moved their place of worship into their fine office building next door. But originally it was built as a Presbyterian church. So it had been a church when first mapped on OSM, but another object lesson that things ain't what they seem and the importance of ground surveys.

OSMers in the snug :4837a

We ended up in the snug at the Poacher for a pie (Paul) and several pints.

Friday 10 June 2011

Hyperlocal Naming

Stanage Edge from Stanage Plantation
Stanage Edge, Derbyshire
The site of hundreds of hyperlocal names used by rock climbers.
Photo courtesy of Earthwatcher BY-NC-ND

We had an interesting a discussion in March on the OpenStreetMap talk-gb mailing list arising out of some controversial naming practices in Loch Lomond and Trossachs National Park ("Giro Bay"). Someone suggested that no-one would or should ever add newly minted names on OSM.

This is an extended version of why I think that names widely recognised in specialist communities should be used when appropriate. I call these names "hyperlocal", because many of them are used for naming features or localities which non-members of the relevant community would not recognise, or would see no need for naming them.

Naturalists. This whole post started because someone expressed surprise that 'non-official' names might be entered in OpenStreetMap (all my examples are from Attenborough Nature Reserve, Nottinghamshire). Naturalists, and particularly those managing and using nature reserves, are creating new names all the time. In part this is because often these places themselves are new (for instance old industrial workings and quarries — "The Delta", "The Bund"), but mainly because much greater precision is required for location information. This is really important if one wants to find the only example of a rare plant, or check on a bird-box location, or photograph a rare fungus, or make sure the right tree gets chopped down. Also one becomes increasingly aware of minor differences which dramatically affect the range of plants and animals over a small area. In principle this could be communicated by using grid references and at a fine scale a GPS. In practice names are more memorable, help to delineate areas which grids don't, and are much easier to communicate. Some names stick, others may be concocted but never enter common usage. Some places have multiple names, and it takes a long time to find a consensus ("Old Car Park", "Fishermans Car Park", "Education Meadow", "Corbett's Meadow"). Many names have multiple intelligible variants ("Butterflies", "Butterfly Triangle", "Butterfly Patch"). In other cases it's possible to see how a name changes. For several years charcoal has been made at Attenborough: the area by the kiln is now nearly always referred to as "by the Kiln", whereas it used to be called "Redwings". In other places locations are known by number alone, for instance 'compartment 73'. I have a map of compartment numbers for Clumber Park, but it's an internal National Trust document, they are used in the national Fungal Records database though.

I have added several of these 'made-up' names which are in common usage around Attenborough Nature Reserve: the Delta, Warbler Dell, Dirty Island Bank, Butterfly Patch (in use since the 1960's), Corbett's Meadow (a recent 'official' coinage, in memory of Keith Corbett who was reserve manager for over 30 years until his death in 2007), which is also known locally as "The Fisherman's Car Park", and "The Old Car Park", and Education Wood (a recent unofficial coinage, around 2005). As all the water bodies were created by gravel working, their names have evolved recently too. I have only added those which are in widespread usage: there are perhaps 50-odd names which were coined in the '60s and '70s, mostly eponymic toponyms, but many never caught on.

Carsington Reservoir is another location where local birders have evolved a significant number of toponyms. Any of the Helm "Where to Watch Birds" regional guides will provide many more examples.

Climbers. Hill and mountain areas are chock full of named features which might never be noticed. This isn't surprising, the cliff shown at the top of this post Stanage Edge has 100s of climbs along its length. The convention is that the first people to successfully climb a route have naming rights, but it's more complicated than that. An even smaller crag further S in Derbyshire, Birchen Edge, has a convention that climbs names should have a nautical allusion (Trafalgar Wall, Camperdown Crawl). I would think in Derbyshire alone there are probably well over 5000 named climbing routes. Many of these will be less than 20 m ascent. On the harder ones each hold will be memorable (I'm not yet aware of these being named yet). On the other hand some places fall into disuse. My father and his friends used to climb at Laddow Rocks because it was accessible by bus from the Manchester area. Much of Laddow is now mossy and green, but everything on it still has a name: several guidebooks were published.

Oddly, in bigger mountains such as the Alps there is not this degree of minutiae in naming: or if it exists it's not published so widely. A number of factors influence this. Climbing in the Alps tends to follow obvious lines because there are a lot more of them, and they're much longer. Sport climbing has used pictorial 'topos' for a long time, so they avoid the same need for names as a narrative descriptive system. There are still plenty of these names though, the Eiger Nordwand has lots, all used in many mountaineering books, and giving a title to one: Hinterstossier Traverse, The White Spider, the Third Ice Field, Traverse of the Gods, Exit Cracks and so on. That is just one face of one mountain, albeit a famous one, and there's enough variety in the type of names to give a feel for how they were created.

Skiers are another group who go into the mountains for recreation. In many ski resorts many of the names are invented and often follow a stereotyped pattern typically with lots of alpine fauna: White Hares, Pika, Marmots, Chamois, Ibex, Ptarmigan. The older European resorts still use lots of names which arose naturally from older uses.

St Anton - Ulmerh¨tte from SchindlergratSt Anton in the Tirol has many ski runs and locations named after alpine meadows. Some of these, such as Gampen (often Gampli) have an interesting linguistic history indicating the presence of speakers of a latin-derived language in the area similar to modern Romansch. Mattun, is another name ultimately derived from latin, referring to a characterstic member of the mountain vegetation. Both of these are fairly common place name elements close by in the Eastern Graubünden. (Place name etymology from the Swiss Alpine Club guides). In addition to the regular runs, St. Anton has a host of well-known, named, off-piste runs. The photo shows the top of a broad but steep gully off the Schindlerspitze: but, I never learnt the names of these gullies. Some of the seriously steep chutes have names which are closely guarded secrets passed on only to those who have skiied them.

Photo courtesy of rickardengstrom CC BY-NC-SA

On the Grands Montets many apparently non-obvious features have names (often different names in different languages): Combe des Amethystes or the Italian Bowl; Canadian Bowl. Again these are not to be found on maps, but are widely used by people who know the area. One or two specialised publications use them too. It's possible to ski a line a few metres from the Italian Bowl and not be aware that its there: even then finding the direct entry which gives an exhilarating steep start requires precise knowledge of the local geography. So once again, these names have arisen from a need for much greater precision than might otherwise be expected.

Ski racing requires a heightened awareness of terrain which again tends to drive increasing precision in naming. One of the oldest open downhill races in the Alps is the Parsenn Derby. When I first visited Davos the piste which formed the lower part of the course had markers giving the names of each turn and schuss. nowadays the course is shorter still, but still features places like the 'S-bends' and the 'Derby Schuss'. It's not for wimps like me:




Motor Racing. Any one who has followed a Grand Prix will know that every part of the track has a name. Of course on a modern track these names will have been invented and assigned by marketeers, but on courses with a long history, like Silverstone, it's easy to guess how the names came about. Early users probably said things like 'I came off at the corner near Stowe', which being cumbersome would get rapidly shortened to Stowe Corner. Of course names were useful to everyone: drivers, instructors, marshals, commentators and spectators. Now they integral to what people expect of a motor racing track. Many of these are captured on OSM, including the circuit featured on Top Gear.


Fishing. I know little about fishing, but it struck me that fishermen must have names for favourite spots, and then I came across the wonderful annotations of Walking Papers by Kirk Lombard. The one I show above is near Fishermans Wharf, San Francisco, with one fisherman's name "Striped Perch Hole". Incidentally, this seems to be a great example of using Walking Papers (and OSM) in an innovative and unexpected way.

Names seem to evolve in each of this situations in similar ways. Often they are created by pragmatic shortening of a description. Memorable events, and the people associated with them, are another rich source of names. Things also get named after people as a token of appreciation, respect or a with to remember them. Thematic naming sometimes works, but often ends up being insufferably twee, and is often resisted or subverted, particularly by wit.

One last one example are local names to the swimming holes on Fairham Brook by Keyworth Meadow NR: see Neil Pinder's article on the parish website. I've asked around for other examples, and someone with extensive experience of forestry noted that workers named landscape features in a very similar way to naturalists I'd be very interested to hear of other examples, particularly outside the interest domains I've covered in this post.

The ability to map these hyper-local toponyms is a very attractive part of OSM. Of course, they need to be researched accurately to ensure they are names which are used rather than 'book-names'.

Wednesday 8 June 2011

POSSUMs

Check the tongue!
Check the tongue!
courtesy of wollombi CC-BY

Not cute animals or vicious pests, but Persistent Open Streetmap Unique MapIds!

I've been intrigued by the idea of creating persistent IDs for geographical objects in OSM for a while. Although things in OSM do get a unique ID there is absolutely no guarantee that a given object will retain a single ID, or that a given ID will not be used for multiple objects over time. A classic example is the pub initially mapped as a node, then with a building outline traces, and finally the tags are moved from the node to the building and the node deleted.

Persistent IDs would make it much easier to link external data sources to OSM data. It may obviate the need to add certain types of data directly to OSM. SteveC pointed out the problem of business names in a comment on another post here.

The basic problem is conceptually not too difficult: I've done similar things with bank customer and account data. In a typical business application the problem is, however, that although individual systems usually have persistent IDs there is no single one which links across systems (imagine say multiple bank accounts, a credit card and an insurance policy). With OSM we don't even have this luxury.

My basic notion would be to choose something like the minimum node ID of an object when it is first created and use that as a reference throughout its OSM life-cycle (including, potentially, several deletion and restoration events). Some kind of collision avoidance is needed for objects which share nodes, and handling of certain border cases of ways sharing all their nodes and some types of relations).

Each time an object is touched in a changeset a process is needed to determine what happens to its persistent ID. In most cases nothing will have affected it, but it may be difficult to ensure that these cases are identified. In general geometry changes are unlikely to have an impact whereas changes in topology will: but, I don't know what one should do in the case of inaccurately mapped objects which change position significantly. Changes in topology may result in logical merges and splits of objects previously regarded as atomic, so I imagine the relationships between POSSUMs will need to be stored. Consider a road segment (A->B) between two junctions which gets edited and an additional junction is added between the two (A->C->B): two new identifiers need to be created for A->C and C->B which are children of the original A->B.

Here'a s list of other characteristics, in no particular order, which I think they should possess:
  • I think POSSUMs should be approached as an enabling technology: there may be many different persistent ID schemes depending on individual use cases. Some people may be interested in the road network, others might be interested in building outlines, or it might be a specific geographical area, and so-on.
  • POSSUMs therefore should be independent of the OSM data infrastructure: of course if they proved very useful then OSMF might want to provide them as a service.
  • POSSUM creation needs to be rule based: starting rules will be based on tags, and some aspects of geometry (e.g., road junctions).
  • They need only cover a subset of data. New subsets should be capable of being added to an existing POSSUM data store (the cage?).
  • It should be possible to add history of elements retrospectively and a pre-existing POSSUM to be used to identify the older versions of an element.
  • POSSUMs would only be created for tagged objects (excluding things like fixme, note, source etc.)
  • POSSUMs and OSM objects would have a many-to-many relationship. Some means of identifying parts of a way (e.g., road segments), and POSSUM roles will be required (e.g., pub as premises or building, pub as business).
  • Relationships between POSSUMs may require some manual maintenance.
  • Applying changes to rules means they also need to be applied retrospectively, thus some POSSUMs will be superseded (just as in the new junction case). This is probably technically the most complex part of the idea.
The primary use case for POSSUMs is for the maintenance of linkages between external data (e.g., public open data) and OSM objects in a way which minimises the amount of external data which needs to be imported into OSM. This in turn should reduce the maintenance overhead of keeping OSM data in sync with external sources.

POSSUMs might also be useful for OSM mappers when frequent uploads to the planet database are impossible. This commonly occurs in third-world countries, but also in crisis mapping and disasters when telecom resources are unavailable. At the moment mapping in this situation can give rise to large numbers of conflicts which are difficult to resolve. Use of persistent IDs might make the conflict resolution process easier.

Aaron Cope has already done some much more sophisticated thinking for the use case of a worldwide building register. Check out his building=yes site using WOEIDs. I am sure there are others thinking about this: let's get the discussion rolling.

I'm grateful to Bob Chell of 1-spatial for a brief discussion of the basic notion at SotM10, Girona. This encouraged me to believe that the idea was not completely barking mad. But any remaining nuttiness is entirely my own.


Saturday 28 May 2011

On the histories of OpenStreetMap data

Peter Körner (MaZderMind) has the recently made available extracts of selected areas from the OpenStreetMap full planet history. These are of various Länder of Germany, but are big enough to be interesting and small enough for repeated processing and analysis.

Handling the history of OSM edits is interesting in its own right, but it also has an immediate practical importance as some edits will not be carried forward with OSM once the OdBL licence change is completed. Tools to process and analyse the complete edit history are needed as part of this change.

My main interest is to visualise past states of the database conveniently. The image below is of Berlin from 31st March 2009. This image was generated by re-extracting data from the history using osmosis (regular snapshot schema tables were replaced by views on history tables) and re-importing with osm2pg2sql. This process is too involved and slow for rapid visualisation of historical views of the data, but does demonstrate that reasonable results can be achieved without elaborate changes to the snapshot schema.

Berlin 2009.12.31

I'm also interested in the technical aspects of handling OSM history: I have worked on temporal database schemas in the past. Some of these were more complex than OSM, as they stored not just history of the system (transaction time), but history of the real-world (valid time). This is also something which intrigues me: how one might store real-world history in something like OSM. In OSM we don't make any attempt to discover when a pub opened or when it closed, we just know that someone added the pub to the system and that later someone removed it. Although it might seem reasonable to assume that the pub closed between the two events, it might be the original edit was by someone who used the pub 20 years ago, and it closed 19 years ago.

My initial approach is simple, some might say naive. This is to basically take a schema very similar to the current API schema and make minor modifications to it. The main change is to add another timestamp column to the main tables, with the time period between the two timestamps representing the period when the given primitive (node, way, relation) was valid. This is partly a convenience for querying the data, but it also has the advantage that we can work with partial histories if some versions are missing. The main reason for doing this, though, is to use timestamps as an additional part of the compound key for way geometries. This avoids having to generate another system key to add to the identifier and version.

Way geometries are the crucial problem in making historical data convenient to render. A given version of a way may have many geometries, depending on how many of the nodes comprising the way have changed position, and, indeed, how frequently. Each time a node changes the geometry of all its parent ways may change: of course a node might just be touched (updated with no change), or might only have its tags modified. In the first instance I ignored these issues, although about 40% of all node changes in the Berlin data do not affect position.

My first step was to find for each way version all the start timestamps for the nodes belonging to that way version and where the node version validity period overlapped that of the way version. As the way version start and end timestamps must also be considered I did this using a UNION:

SELECT
w.way_id,
w.version,
greatest(n.tstamp_from, w.tstamp_from) AS tstamp_from

FROM way_hist w
JOIN way_node_hist wn
ON w.way_id = wn.way_id
AND w.version = wn.version
JOIN node_hist n
ON wn.node_id = n.node_id
AND w.tstamp_from <= n.tstamp_to
AND w.tstamp_to >= n.tstamp_from
UNION
SELECT w.way_id, w.version, w.tstamp_from
FROM way_hist w
It's easy to add a valid end date column to this data using a SQL windowing function.

This gives us the total number of historical records we will need to store to access all distinct geometries of ways in the data set.

Exactly the same approach can be used to extend the way_nodes table with node version, and a validity date range based both on the way version and the node version. Indeed the range of way geometries can be derived from this data, but because we also need to slice up the validity range of an unchanged node for each way geometry it is convenient both conceptually and practically to keep these data separate.

With this data we can now start to generate all the historical way geometries. Unfortunately this is computationally expensive in PostGIS: one of the reasons Osmosis provides options for doing this before loading into the snapshot schema. I used the aggregate ST_MakeLine postgis function, which requires that the nodes be correctly sorted in the input.

With historic tables for nodes, ways, and way geometries created the next step was to create history tables which mimic the tables created by osm2pgsql for mapnik rendering. For nodes this is relatively simple, its just a big query converting relevant hstore data into columns along the lines of tags->'highway' as highway with the addition of my two timestamp validity columns. For ways it is more complex: the z_order and area columns requires population, and some ways need to be treated as polygons not lines. Thanks to asciiphil I was pointed to the relevant routine in the back-end processing of osm2pgsql which handles z_order. This is simple enough to replicate in SQL. For deciding which ways to add to the lines and polygon tables I used the default.style file from osm2pgsql storing this in a table.

My first pass seemed to work OK: I didn't try and turn linestrings into polygon geometries, and I didn't do anything with relations. This was the result:

Failed render of snapshot from Berlin history

Really this is a dismal failure. Firstly running mapnik to generate the image took forever: each query was doing a table scan on the lines table rather than using the GIST index on the geometry. Secondly, it turned out that I had a problem with the generation of geometries in PostGIS: hence the lines all over the place. Node order in ways was not carried over properly into the MakeLinestring function.

Obviously, I'd hoped to get further with this, but I'm going to have to concentrate on getting the geometries right: testing using a correlated query seems to generate sensible results. That being said, small volume queries with aggregation seemed to work properly too. A kludgey solution to mapnik performance would be to just extract data in the map extent bounding box before running the mapnik style, and I may do this before looking at the PostGIS performance issues systematically.

A few other things of potential interest: the Berlin data has around 250k ways, collectively with over 600k versions. I identified around 1.6 million potential geometries for these ways. The actual number of distinct geometries is substantially smaller than this because of the high proportion of node version changes which do not affect position.

No doubt far more sophisticated things will be implemented at the upcoming hack weekend. Personally, from my experience so far, I'd like to see at least some history support in the existing tools.

Sunday 15 May 2011

Access Land

Snowdon from Moel Eilio : 3351a

Snowdon has experienced flurry of edits on OpenStreetMap after the last two long weekends. I happen to be one of the culprits as I've been adding some detail from last July: not least because a family member is planning an attempt on the Welsh 3000ers next month.

The availability of Bing aerial imagery allows a lot more detail to be added, particularly when I can relate it to geotagged photos. I've been able to add:
  • walls and fences alongside or crossing tracks;
  • bits of marshy ground;
  • extend paths which I saw, but didn't walk;
  • and, correct alignment of streams and rivers.
In doing so I've also noticed quite a few things which I recorded but had not transferred to OSM: notably stiles and gates.

Access Land roundel : 2684bWhilst looking at these photos I noticed one of a stile had a roundel indicating the start of Access Land. I've suggested before that it's worth mapping these locations as the complete area covered by access land is only likely to be worked out through gradual accumulation of access points. However, when push comes to shove, I'd not done it myself and couldn't think of an obvious tag for it.

Top of Glyn Rhonwy Acces Land : 9225aI also found a photo of a large notice for the area, Glyn Rhonwy. This contains a lot of old slate quarries including one huge hole. The notice states that the upper part of the area is access land. However the stile we went over when we came out of the old quarry area had a rather threatening warning notice. This appears from the Ordnance Survey mapping to be exactly where the Access Land starts.

Bottom of Glyn Rhonwy Acces Land : 9187aThere was another puzzle on the edge of Llanberis. We wanted to avoid walking straight down the road, so had aimed for a footpath marked on the map leaving at a corner of the road above the tea shop at Penceunant Isaf. Checking the OS map today I see that this is also Access Land. All we found was a firmly locked gate.

All in all the Access Land areas in Snowdownia are not that well marked, and are often huge: making them poor targets for mapping. In most cases they're fairly obvious, or are areas where access has been presumed for many years, so adding them to OSM is not a huge help for the walker. Closer to towns and villages there still seems to be plenty of ambiguity. I'm not a fan of the huge intrusive notices that do exist either. They seem inappropriate in a National Park.

Although I added a lot of detail in Snowdoia, I needed to find somewhere else to look at mapping Access Land.

Chrome Hill : 2633a

The one place I have been in the past couple of years which is access land and where I probably have enough information to map access land is Parkhouse Hill. This striking little hill, one of a pair of fossil coral reefs (the other, Chrome Hill is shown above), lies just inside Derbyshire in the upper reaches of the valley of the Dove. It is also one of the places which was pretty inaccessible until Access Land was introduced by the 2000 CRoW act.

View N from Parkhouse HIll : 2666a

I took a whole series of photo from the ridge of the hill in September 2009, but without good aerial imagery it was not possible to delineate the outline of the accessible area. Only when writing this post did it occur to me that I could now complete this mapping. Accordingly, I've mapped the access land itself as a relation, and added the tag entrance=access_land where I recorded the relevant sign. The tagging is still pretty tentative, but I hope having something concrete will lead to a common approach.

Saturday 14 May 2011

Radburn : exemplar or nightmare

Radburn is a small community in New Jersey.

Until a few weeks ago I'd never heard of it, but at a friends party we got talking about The Meadows area of Nottingham, how it's changed and how it might change again. The council plan to partially revert the area to a gridded layout and reintroduce a linear shopping street. The area as it is now consists of many short cul-de-sacs off orbital distributor roads, with many houses accessed along paths or through communal green space. Radburn, NJ is the archetype for this type of layout.

In The Meadows : 9964a

I have been mapping The Meadows in a desultory way now for a while: it's not the best part of town so I like to keep mapping visits fairly short. The areas which were re-developed in the early 1980's have an amazing complexity of footpaths, and short side roads. Varying height buildings also sometimes contribute to a canyon effect with the GPS, particularly in the parts which still consist of late 19th century terraced housing.

The Meadows were originally the water meadows of the town of Nottingham, famous in the spring and autumn for displays of crocuses (Crocus vernus in Spring and C. nudiflorus in Autumn). These non-native plants are supposed to have been introduced to the Nottingham area by the monks of the Clunaic Lenton Priory. They still survive in various sites in and around the city (for instance the General Cemetery, Waterloo Promenade, University Park). The area started being developed after the Nottingham Enclosure Act of 1845, when certain parts of the area were dedicated as green spaces (e.g., Queen's Walk). The remainder was largely developed as high density terraced housing with some small scale industry. A colliery opened by the River Trent around 1910, followed by a council-owned power station (both now the site of a retail park). This was what led to the story of an ignorant DJ introducing a request with "Crocus Street, The Meadows. What a lovely address!".

By the 1960s this was one of the areas the council was anxious to re-develop in what were then known as slum-clearance programmes. The whole western portion was demolished in the mid-1970s, to be replaced by lower density housing with much more green space and small gardens for most properties. Additionally, private developers also built housing in this area. The remainder was re-developed piece-meal with the larger original houses being kept smaller less salubrious places being replaced. This accorded with changing planning fashions, with 'cellular renewal' replacing big bang clearance projects. Probably this was because by the late '70s it was clear that there were problems with many of the replacement buildings of the 1960s.

What staggers me is that now, thirty years later, the council wants to 'regenerate' the area. Apparently, its problem is all to do with the Radburn layout.


View Larger Map

One of the great things about worldwide on-line maps is that we can look at places as far apart as New Jersey and Nottingham in a similar way. On OpenStreetMap the cartography is the same worldwide too. This really brings home the differences between the real Radburn, and "Radburn", the planners concept.

The Meadows is obviously much more densely populated: it has a highly convoluted network of paths, whereas in the Radburn the purpose of each path is pretty clear just from the map. The green spaces are smaller, and less connected too. However, The Meadows housing is far pleasanter than its predecessors and the green spaces do sustain a decent amount of wildlife.

I also, doubt if Radburn has a deeply obscure house-numbering scheme, with houses on the same road having different streetnames. Some named roads don't have any addresses associated with them at all.

There are plenty of other places in the UK which are tarred with the now tainted "Radburn" name, like Oxmoor, Huntingdon, a suburb of Havant, suburbs of Telford such as Woodside, Dunclug, Ballymena (NI), and Deanbank, Penicuik (Scotland). Many of these bear only a tenuous resemblance to the original: a planning concept having been transmitted by chinese whisper. Some of the problems are glaringly obvious when one tries to map these places: its difficult to determine addresses, there are far too many paths around the back of properties, few shops have survived, the green space is festooned with "NO BALL GAME" notices. BUT, just as these designs were a planning fashion, it looks like denigrating them is one too. It seems too easy to lay the blame for problems in these areas on the original planners and architects, rather than understanding them as aspects of more complex social issues.

In case of The Meadows, the money for its has 'regeneration' disappeared in the great wave of government cuts. Perhaps this will lead to a more incremental approach, rather than the traditional waving of the planners magic wand to fix social problems.

Monday 9 May 2011

Catmose Campus : an Oakham update

IMG_4429a
On the way back from Barnack I diverted from the Oakham ring road to go and check how much of the Catmose Campus was open. I mapped this a few days before its scheduled opening in February, so it was good to have a simple quick mapping objective.

catmose_campus_merge_4422e

It proved that my quick photos taken in the rain were not a very good guide to the actual location of the building (compare with the actual GPS traces in this screen snapshot):

Catmose Campus in Potlatch2


As I walked up to the entrance to the Sports Centre I reflected that at some stage most mapping should become like this. A short stop off to grab the new details of the changed area, and a quick and simple edit session to finish off. We're a long way from that point, but before we get there we'll need better ways to identify areas with changes and to manage checking their status.

All-in-all a more satisfying mapping experience than last time I was in Oakham: the weather was nicer too.

Saturday 7 May 2011

Along the Fosse Way: mapping a new road


The A46 between Widmerpool and Farndon (near Newark) is being reconstructed as a dual carriageway and is due to open next year.

Kev Swindells has done a great job mapping the southern section of these road works to the junction with the A6097 at East Bridgford. Several sections are already in use, and once again OpenStreetMap is more up-to-date than other online maps. The most obvious of these is the big new roundabout West of Bingham (below, also compare with Google Maps).



At our first pub meetup in Nottingham I asked Kev he was going to do any mapping North of East Bridgford, but apparently its beyond his normal bike runs. I'd already taken some audio notes last summer and had marked in a new overbridge near Elston. So on a nice spring evening I decided to see how much information I could collect for the remaining part of the route. I tried to cross the line of the road works on as many minor roads as I could, starting from Newark and working my way back towards East Bridgford.

IMG_3569a

Even with decent waypoints for the crossings it's not straightforward to get the rest of the alignment. Several things do help:
  • Powerlines. Lots of powerlines cross the Trent S of Farndon, and all have been mapped (from surveys, Bing and OS OpenData). From photos I can check how the road alignment compares to pylon positions.
  • Field Boundaries. Most field boundaries visible on Bing are still there: obviously they've gone along the construction zone, but again it's useful information.
  • Hedgerow Trees. The farmland on the E side of the Fosse Way towards the River Devon has little in the way of trees and no patches of woodland. Each large tree which can be located on a photo can be matched to the Bing imagery.
  • Aerial Photography. In a couple of places the Bing imagery is recent enough to be able to see that the ground has been cleared in preparation for construction. This was particularly useful just N of Flintham close to the point when the new road crosses the old road (nothing on the ground yet: presumably this will be the last phase of construction.
This mapping is very preliminary: Kev mapped the Southern section in multiple visits by noting where the roadworks obstructed cycleroutes. Ideally more of the local minor roads and footpaths could be mapped to add to the detail as the road nears completion. This is particularly true near East Bridgford. The precise location of the two carriageways is also not clear: round RAF Syerston it looks as if the cutting is not completely excavated yet.

All Saints Syerston : 3581aAll Saints Hawton : 3535a

I didn't forget to collect other data: the little village of Syerston was missing all its roads. It has a nice little church too. I was able to do a bit of general tidying up of the other minor roads. Another nice church, which was already mapped is at Hawton: this has a very fine Easter Sepulchre in the chancel. Both churches are dedicated to All Saints.

Friday 29 April 2011

Leafy St Ann's : pre-pub mapping

Mini-mapping party: 3092a

Despite it being just after Easter and the date being announced with only a weeks notice, there were 6 of us at the second OSM pub meet-up. This time we met at the Lincolnshire Poacher on Mansfield Road: definitely a pub worth of the real_ale=yes tag.

With lighter evenings it seemed worth doing a bit of mapping before having a drink: also a good opportunity to show people the ropes. I chose the Poacher, not just because its a nice pub with good beer, but because there are quite a few places in the immediate vicinity which could do with more map survey work. These include the Arboretum, Church Cemetery, also known as the Rock Cemetery and residential areas either side of Mansfield Road.

I'd done a bit of preparation for each of these, but once I realised that there'd be enough of us I persuaded people to visit St Ann's. Although this inner city area has a poor reputation I knew a decent route, and it offered a much more varied insight into the issues associated with collecting data for OSM.


The main mapping objective in this area was to fix roads with missing or, possibly, incorrect names. I had made a print out (see above, and here registration required) from ITO OSM Analysis which provided the basis of a reasonable hours mapping. We walked a loop down behind Huntingdon Street, along St Ann's Well Road and then along the tree-lined walks of Robin Hood Chase, Corporation Oaks, and Elm Walk back to the pub.

IMG_3090aSt. Ann's is predominantly 1970s and early 1980s social housing: the area having been re-developed as part of a slum clearance programme. It was planned with a 'Radburn' layout : lots of cul-de-sacs and interconnecting walkways. Addressing and street names are really confusing as we immediately found out. We walked through a twitchell from Huntingdon Road and found houses with addresses on Sherwin Walk, but we were looking for Sherwin Close. On the opposite side of the road were houses on Welbeck Walk, with Welbeck Close already mapped on OSM. This seems to be typical throughout the area: house addresses are slightly different from the streetname. In addition there are groups of houses which have a separate address, but no linked street: Rodel Court was one I noticed as we crossed Abbotsford Drive.

Open Space, Instow Rise : 3123aOnce we turned onto St Ann's Well Road everything was much greener. The road itself is tree-lined, with lots of grass between the road and housing. We popped up one road (Instow Rise) to pick off a few street-names and came across another big green open space. Back on the main road I noticed a pair of Mistle Thrushes flitting about.

Right of Way notice : 3135aThe real surprise, for those not in the know, was Robin Hood Chase: this is part of a series of green spaces preserved when the Nottingham common land was enclosed. This included the Clay Field an open field dating from mediaeval times, which covered the area which is now St Ann's. Although most of the land was very rapidly developed, a good proportion was retained. In recent years this wise decision has been marked by the Nottingham Enclosure Walk on the last Sunday in June, which takes a rambling route through the parks and gardens created as part of the Nottingham Enclosure Act. At the top end of the Chase there was an old cast iron notice stating that the council had not dedicated it as a highway. I presume this means that access rights are permissive, but more research is required. (It's worth noting, perhaps, that Robin Hood Chase does not appear on other online maps).

IMG_8406a

I forgot to take any photos so I've borrowed one from Geograph, and grabbed one from the 2009 Enclosure Walk. These give a reasonable idea of its role as a green lung.

SK5741 : Looking up Robin Hood Chase by Alan Murray-Rust



IMG_3125aWe managed to add or correct around 10 street names, but we noticed a lot of other things. Here's a non exhaustive list of the various other objects which got mapped from this session: traffic tables, speed limits (20 mph zone), access restrictions, traffic calming measures (cushions, humps, bumps, chicanes), pedestrian crossings, cycle parking, an ATM, shops, restaurants & fast-food outlets, an old cinema (see picture), churches, a few house numbers, post boxes, telephone kiosks, bus stops, a green roof, and a dead pub. This diversity is good for an introduction as most of these only require a node adding to OSM, so its possible to focus on tagging. Cycle Parking was the immediate attention grabber as there was no space at the back of the pub for a second bike: not only was the first cycle stand we came across mapped, it was used too.

Hand Car Wash (dead pub) : 3108aThere were also a few things which didn't get mapped for sundry reasons: solar-powered parking meters, hand car-wash in the former pub car-park, a SureStart children's centre, and the East Midlands Anglo Chinese Society. In the main this is because I don't know how to tag them: but hand car washes in derelict filling stations and pub car parks are becoming such a conspicuous feature of the contemporary urban landscape they probably deserve notice.

In the pub we talked quite a bit about OpenStreetBugs: David and Marcus both seemed keen on the idea of being able to mark-up things requiring mapping as an entry-level way of contributing. Kev showed how he was using OSB to mark cycle parking. David commented that he'd want to be able to filter by a category such as 'cycle infrastructure'. We briefly talked about KeepRight as well.

This led on to David's observation that he felt that quite a lot of existing cycleways were overly generously tagged. Kev agreed and remarked that he'd recently removed a lot of cycle tags on roads in the city centre: partly in response to change signage, but also because the original mapping was less than useful for cyclists. This is definitely a theme to follow: getting some in-depth feedback on what is mapped from potential users is not particularly common with OSM. I'd like to see what we can do in this direction with the Pedals membership.

This naturally led onto the Nottingham Hackspace. Last month Marcus suggested the Hackspace would be a good place for an OSM event. I went along the following open evening to see the Hackspace and meet its members and have been thinking about what we can do together ever since. But early in April the Hackspace's landlord lost its tenancy, and I've been following the saga of finding a new space. This was finally resolved only on Monday: the new space is huge and located just to the east of Sneinton Market. I only hope there isn't a plan to move every month requiring a re-map of the space and its surrounding area.

Thanks to everyone for an interesting evening. Once again I find I'm learning lots by talking with people less familiar with the intricacies of OSM.

Lastly, I've set next month's date for 24th May back at the Lincolnshire Poacher.

Thursday 28 April 2011

Is a research paper on OSM a derived work?

In a long an insightful blog post on Google MapMaker's US launch Mike Dobson mentions a paper by Girres and Touya. According to his blog this paper uses a similar approach to that of Muki Hakaly to analyse French OSM data. I'd like to read this paper, and consider its implications : after all I've contributed to the data that it has used. However it lies behind a pay-wall here: Girres, J.F. and, Touya, G. (2010). Quality Assessment of the French OpenStreetMap Dataset, Transactions in GIS 12(4), 435-459.

I didn't find an electronic offprints at the researchers lab site: IGN COGIT. If anyone knows where I can get a copy let me know.

I would regard it as a normal courtesy of scientific publication to give contributors access to paper based on their data. Certainly the derived data which underpin the work should be made available as Hakaly does, but it's the overview and conclusions which are of most interest. However, I'd like to pose the question: Should this work have been published under CC-BY-SA?

Sunday 3 April 2011

Old Ballparks and other Stadia

Forbes Field, 1966: 203a

Recently it was noticed that the baseline at Yankees Stadium had been added as a oneway footway on OpenStreetMap. This triggered a discussion about the minutae of the game on #OSM. Now, much as I love baseball, I'm pretty hazy on the fine points of the rules, and, indeed, terminology. This is pretty much because all of my knowledge was garnered as a third-grader a very long time ago. I only ever saw two MLB games live (box scores here and here), both at the long vanished Forbes Field.

This prompted another rumination about the geography of vanished bits of the social fabric (which seems to be a developing theme on this blog).

In Britain and the States a huge number of sports stadia were built at the start of the 20th century. When I was growing up these places had a suitable romantic patina from age and great sporting exploits: a romance which belied their often smelly reality (John Bishop is very good on this). By the 1960s even with a modest amount of investment they were becoming rapidly superseded by social change. Many were difficult to maintain, were no longer suitably located (particularly when fans started coming by car) or had more profitable uses.

Cheaper air transport drastically affected Major League Baseball: it was no longer necessary that all the MLB franchises be within an overnight train journey of each other (discussed and mapped in "Spatial organization; the geographer's view of the world" (Abler, Adams and Gould (1971) pp. 377-9). So in the late '50s the Boston Braves moved to Milwaukee, the Brooklyn Dodgers to Los Angeles and the New York Giants to San Francisco. Although franchises had moved a bit before that time, it ushered in an era where owners could extort subsidies from local tax payers by threatening to move the club. Yet in many ways the more modern stadia were a false start. The multi-purpose stadia built in this period ('cookie-cutters') have now being torn down and replaced by separate parks for baseball and gridiron. That great novelty, an indoor ballpark, the Houston Astrodome is now a mouldering, crumbling safety hazard. Perversely more recent ball parks have tried to capture the feel of the early 20th century Jewel box parks.

In Britain change came later, and in response to several fatal disasters which highlighted the poor safety arrangements of many of the older grounds. As a teenager I watched games from cinder-covered terraces, or seated in wooden fire traps like that of the Bradford fire. Notts County's 1910 Meadow Lane ground even had a stand which was older than the ground, having been transported across the Trent from the county cricket ground.

These are the English Football League grounds which have vanished since I watched a match there : Filbert Street (Leicester), the Baseball Ground (Derby), Saltergate (Chesterfield), The Old Showground (Scunthorpe), Fellows Park (Walsall). I'm not particularly nostalgic for any of these, as a kid it was often difficult to get a decent view, the loos were laughable, and as we learnt, safety was poor. Filbert Street was a disaster waiting to happen: we 'watched' a game in such a crush that my father was so fearful of our safety that he hoisted us to the top of a wall away from the crowd. Variously these are supermarkets, housing developments or still brownfield.

However, the memory of great sporting drama at such vanished locations lives on in the memories of thousands of people. Over the past few years, The Observer has run a fascinating series of articles on the fate of a whole range of stadia after they ceased use for sport. These often managed to illustrate the intricate relationship between sport, economics and (urban) social fabric. Unfortunately these pieces don't seem to be available on-line. There are plenty of websites devoted to the subject: e.g., Old Grounds, and Aerial Lost Stadiums. Wikipedia's articles on the subject are usually detailed and thorough.

Stadia like many other historical sites are problematic to map on OpenStreetMap. When some fabric remains, then it's usually possible to tag this in such a way so as to indicate its original purpose. Adding single nodes for the former site (as the imported GNIS node does for Forbes Field) is probably acceptable, without challenging OSM's absence of a model for historical mapping. Culturally, the sites of old stadia might elicit wider interest, than, say, sites of battlefields.

Thursday 24 March 2011

Why do lamp-posts have asset numbers?

IMG_2042aA few weeks ago there was an involved, and sometimes, heated, discussion on the main OSM mailing list about imports. Many of the comments were interesting and useful, but one particular strand has attracted my attention. A slightly grotesque paraphrase of these various messages might be "OSM is a poorly managed Computer Science project, with inadequate tools, particularly for version control."

Leaving aside that OSM is neither a project, nor managed: I'd like to focus on what seems to be a surprising mis-perception about OSM as a database.

Firstly, databases don't have to be digital or stored electronically. Phone books, card indexes, and many reference books are easily recognisable as databases. Therefore a primitive database operation is only valid if it can be applied also to non-digital media.

Secondly, and this directly follows, information stored in a database does not have to have a unique identifier (a primary key). There's nothing stopping a telephone number appearing several times in a phone book, or complete pages being duplicated.

Once data is moved onto digital media, it really helps to assign unique identifiers: data can be restructured to be stored more efficiently or be easier to change; it's easier to spot duplication or bad data. This is exactly what OSM does, nodes, ways and relations are identified internally by system generated keys.

And it's what my local council does. They maintain an asset register of street furniture (bollards, traffic signals, parking signs, parking meters) and within this application assign identifiers. These days all this information is geocoded and available in the council's GIS. This information is obviously useful for financial planning, maintenance and other activities. BUT, they've gone a step further and each asset now has it's system assigned number marked on it. WHY? Because, "replace the bulb in lamp 35621" is a lot more specific than "replace the bulb in the lamp outside 25 Main Street". There may be lamp standards opposite each other at that location, or there might be two Main Street's or there may be no number on 25 Main Street, or the house might have been demolished. However, this number DOES NOT uniquely identify the lamp-post: it might do when combined with its location data, location data of the organisation which has assigned the number and information about the status of the system used to generate the number.

What does this have to do with OSM (apart from the wonderful possibility of collecting lamp-post numbers). Well OSM is like my local council, except that our local patch is a bit bigger, and we cannot go stencilling numbers on anything we map. So we have no means to tie an OSM object to its corresponding thing in the real world.

A corollary to this is that we cannot confidently tie OSM objects to geolocated objects in other databases. There are far too many variables to even inspire confidence in fuzzy matches: when was the OSM data mapped? what sources were used? how accurately was it mapped? when was the external data mapped? how current is it? how complete is it? does it have unique identifiers? are the identifiers persistent?

So, we have a host of problems in matching data from an OSM dataset and an equivalent external dataset. These problems relate to location accuracy, temporal accuracy, matching identifiers, and accuracy of associated data.

A good example of these problems is shown by OS Locator Musical Chairs and ITO's OSM Analysis which compare OSM street name data for Great Britain with the OpenData Locator dataset from the Ordnance Survey. This is a nice clear domain with the OS Locator data being from a known source and date and from a highly reputable national mapping agency. In some areas we have enough separately sourced data in OSM to have a handle on how accurately we can match these datasets. In most areas in England about 0.5-1.0% of Locator records cannot be matched to OSM. (I am not aware of reverse statistics, but in a recent survey aimed at hunting down some 20-odd of the mis-matches I found 5 street names used for addresses which are not present in the OS dataset). Even different datasets from OSGB have enough inconsistency to prevent complete matching. And these cases are relatively easy ones.

There are also problems relating to the purpose of external datasets: cadastral data might not reflect the building outlines we would draw naturally (e.g., French & Spanish cadasters); hydrography data might be segmented for water-flow measurement (e.g., NHD); vector data might be optimised for rendering (OS OpenData VectorMapDistrict); road data might not need to be very accurate (TIGER). The imported data should be restructured to reflect what is important for OSM, not maintained in aspic for some putative update.

So those advocating data imports or having 'development forks' of OSM need to answer : how on earth can you easily relate objects between two different data sets, or even the same data set at different times. Alternatively, we could all add some stencils to our mapping toolkits, but even then we'd have to leave our armchairs.

Postscript: the council are busy replacing all the local lamp posts, wonder what number they'll put on the new ones.