Cartographic generalization: prioritizing features with QRank (Feb 3, 2024)

(Return to the blog homepage.)

Cartographic generalization is how a mapmaker picks what to show a user on a map when the map just isn't big enough to show it all.

Consider a map of Washington state. If we were to show all the key:place nodes, even without labels, it's a big mess, with many overlapping places:

This post will explore how to do simple generalization using tilemaker and Wikidata's QRank data.

The map schema and styles

All the maps on this page use the same style, and (mostly!) the same, simple schema:


The schema declares two layers:

  1. boundaries, which contains state boundaries as lines
  2. places, which contains places (e.g., villages, towns, cities) as points


The style:

  1. draws the state boundary
  2. labels as many places as it can

Attempt 1: Simplest possible thing (cities)

Hey, maybe we don't need to do any generalization. Let's write a Lua profile that emits the state boundary and all the place nodes.

Maybe it just works, looks great, and we can knock off for the day. The Lua looks like this:

node_keys = {'place'}

function node_function()
	local name = Find('name')
	if name == '' then return end

	Attribute('name', name)

function way_function()

function relation_scan_function()
	if Find('boundary') == 'administrative' and Find('admin_level') == '4' then

function relation_function()
	if Find('boundary') == 'administrative' and Find('admin_level') == '4' then

We'd expect the map is not going to look great... and, well, see for yourself:

Attempt 2: Limit the number of features (cities-limit)

One obvious problem with the previous map: there are way too many places labelled at low zooms.

Adjust the places layer to have feature_limit and feature_limit_below settings:

"places": {
    "zindex": 2,
    "minzoom": 6,
    "maxzoom": 10,
    "feature_limit": 5,
    "feature_limit_below": 10

This tells tilemaker to put at most 5 places in each tile. If the user wants to see more, they can zoom in. It's a little better:

Attempt 3: Use OSM tags to prioritize features

Now that the map isn't cluttered, we can actually read all the labels.

And we notice that many that we'd expect are missing:

It'd be reasonable if some are missing... but in fact, they're all missing. What gives?

tilemaker is simply taking 5 arbitrary items for its feature limit. We need to tell tilemaker how to rank the features.

Let's rework our node_function to use the place=* tag -- items with a higher ZOrder value will be preferred by tilemaker:

function node_function()
	local name = Find('name')
	if name == '' then return end

	Attribute('name', name)
	local rank = 1
	local place = Find('place')
	if place == 'hamlet' then rank = 2 end
	if place == 'town' then rank = 3 end
	if place == 'village' then rank = 4 end
	if place == 'city' then rank = 5 end
end us this map:

Looking better! The big cities we care about are present -- Seattle, Tacoma and Spokane.

There are still some issues:

Maybe we could add some heuristics to resolve these concerns:

But as we start to explore that, it feels like we'll always have to tweak rules. Maybe some place=villages are noteworthy enough to appear at low zooms, for example.

Worse, what happens when we think about other features we might like to show on the map? All of our rules are place specific. They won't generalize to mountain peaks, bodies of water, etc.

Attempt 4: Use QRank to prioritize features

Luckily, there's another option. Many OSM items are linked to a Wikidata item. Many of those items are linked to Wikipedia pages.

Could we drive feature selection by how much "mindshare" the feature has, as measured by visits to Wikipedia?

Yes! It turns out to be straight-forward. Wikimedia hosts the Wikidata QRank project. It's a CSV with scores for many Wikidata items, based on pageviews. I host a QRank SQLite db and qrank lua module that provide for easy integration into tilemaker.

Let's adjust our node_function further:

function node_function()
  if Find('place') == 'state' then return end
	local name = Find('name')
	if name == '' then return end

	Attribute('name', name)

Now our map looks like:

Hmmmm. On the one hand, Forks and Olympia now appear. That's excellent.

On the other hand, Index, Aberdeen, and Cheney now also appear. Unfortunately, this seems to be a downside of QRank: it confuses things with similar names. The Wikipedia page for Index is a disambiguation page, with one of the options being the page for Index, WA. I suspect QRank is summing up the popularity of all of the entries on the disambiguation page, giving Index, WA an unfair advantage over other cities. I imagine Aberdeen is getting a boost from its much more famous Scottish peer, and Cheney a boost from a former US politician.

Perhaps someone will come along and fix QRank? Until then, it seems like we'll still need some hand-tuned heuristics.


Generalization is necessary to avoid a cluttered map. Ranking and limiting the number of features shown is one possible generalization technique. The ranking can either be based on a rules system, or based on an external signal of importance.

In both cases, you'll often need a human in the loop who actually looks at the result and makes some judgment calls about whether it's good enough.

The code for this post is available on GitHub at is available at hikeratlas/qrank-demo.