Posts Tagged ‘data processing problems

29
Jun
10

The Vanishing Kingdom

Yesterday evening, I was having a conversation with one of my roommates about Beaver Island, which lay in the north of Lake Michigan. It’s a sizable chunk of land with some interesting history. It was, at one point in the 19th century, home to a kingdom inhabited by a breakaway Latter-Day Saints sect, until the US government facilitated the assassination of its eccentric ruler and the ejection of the Mormon settlers. While mentioning the island to my roommate, I pulled up Google Maps in order to show him where it is. Except it wasn’t there. An entire archipelago, in fact, was missing from the map. Compare the satellite photo to the map and note the difference:

Screenshots from Google Maps, 6/29/10

Perhaps more amusing is the fact that, when you zoom in sufficiently, the road network for Beaver Island (which has a population of about 650, according to Census estimates) still appears.

Screenshot from Google Maps, 6/29/10

Now, I don’t know how the sausage is made over at Google, but I’m guessing it’s a mostly automated process, given the magnitude of their undertaking. And this is what happens when you let computers keep running with insufficient oversight. This is not exactly a tiny island — it’s 55 square miles, and given how large of a scale Google lets you zoom in to, it’s not something that should be left off. Whatever algorithm they’ve used to generalize their data, it’s in need of tweaking. It’s leaving some smaller islands, but eliminating larger ones. Note the smaller Manitou Islands in the south of the first images above, marked as Sleeping Bear Dunes National Lakeshore. Despite being uninhabited and smaller than Beaver Island, they made it on the map. One of them is rather terribly distorted, however — the polygon is way too simplified for the scale.

It’s been said over and over again, but it’s still worth hearing: be careful when using Google Maps and its cousins. There are very few human hands in their creation, and not enough of the scrutiny required to prevent gaffes of this magnitude. Of course, you should be careful when using any map; once humans start making the data and design decisions rather than computers, major geographical errors may become infrequent, but more insidious problems crop up, as we discussed a few months ago.

This is also where learning lots of random geographic facts can be handy. It’s easier to catch the omission of Beaver Island if you know ahead of time that it exists. This is how I justify spending way too much time on Sporcle taking geography quizzes — it will hopefully make me less likely to make an error like the above.

The lessons from today’s map are obvious, but it’s always good to be reminded from time to time of the importance of careful editing. And the end result is a bit amusing here.

One Nice Thing: At least they’ve got a form on the page which I can use to report this error.

Tomorrow marks one year of blogging here on Cartastrophe. I really wasn’t sure that this experiment was going to last more than a few months, but your comments and emails and support have kept things lively. I appreciate your coming along for the ride. To all who have sent submissions: thank you. I don’t use all of them, but I appreciate everyone keeping an eye out and thinking of me, and hope you will keep doing so. This blog has been great for my own growth as a designer, and I hope that you have gained something from it, as well.

Finally, it comes to my attention that there’s another blog out there in a similar vein to my own. If you’d like a double dose of map critique, have a look at Misguided Maps.

31
Aug
09

In Need of a Dancing Banana

(Editor’s note – Continuing our series of getting cartographers to publicly criticize themselves, we next feature Mr. Andy Woodruff, proprietor of Cartogrammar and an alumnus of the famed University of Wisconsin Cartography Lab. If you’re interested in following the brave example of Mr. Woodruff, and Mr. Reynolds before him, and showing the world some of your own cartastrophes, please write me at cartastrophic@gmail.com. — DH)

Animated maps can be a delightfully cartastrophic realm, rife with dizzying excessive motion, poorly or over-designed interfaces, annoying sound effects, and (I really, really hope) perhaps a few dancing bananas. Daniel will perhaps be wise to steer clear of them here unless he is willing to give up the rest of his life to unearthing all the bad animated cartography on the internet. With this post I will only lead this blog to test the waters gingerly.

An animated map of Wisconsin farmland

This is a map I completed for a class called Animated and Web-based Maps, instructed by Professor Mark Harrower, the king of cartography at the University of Wisconsin-Madison, and, I should add, a known expert in map animation. It’s animated, sure, and you can click the image to load and play it, but that’s not entirely necessary for what follows. Very briefly, a bit of background on the map: it was made for a lab assignment in which students were provided with a set of county-level agricultural data for the state of Wisconsin for the years 1970 to 1999 and instructed to make an animated choropleth map of a variable of their choosing.

Typically, my first reaction to viewing this map is to vomit at the sight of the colors. Unlike many of the maps that make their way onto this site, mind you, I actually used an appropriate color scheme: a diverging scheme using official Cindy Brewer ColorBrewer specs (and it’s even colorblind-friendly!). But this red-blue scheme combined with the interface color pair of green and more disgusting green… well, I hope you’re reading this on an empty stomach. Otherwise I probably owe you a new keyboard. Oh, and if your eyes aren’t completely filled with blood yet, you just might discern in the background the ghost of a photo I took in Wisconsin’s Driftless Area.

But chalk all that up to bad taste. On to the actual cartographic crimes.

First, the white color in the classification scheme, labeled “0% or No Data.” Hold on a second, 2006 self, there’s a big difference between those two! Zero is a legitimate data value that fits in the classification scheme. “No Data” is a different animal entirely. Ideally one would avoid an incomplete data set in the first place, but sometimes that’s what you’ve got (like when that’s the data provided for the assignment). In those cases, areas without data can’t be indistinguishable from areas with data, or else the map reader can never really know what’s going on. Look at the screenshot: which counties have no data, and which have a value of zero? Impossible to know. In reality perhaps some of those counties should be off-the-charts red, but you’d never know which ones. The counties without data need to be shaded with a color that is not in the map’s color scheme, probably some kind of gray. Worse, in the image above there is in fact only one county with no data, but guess what the tooltip says when you mouse over it. That’s right, “0%.”

Next, two points about the choice of variable to map, beginning with the description I wrote near the legend:

“This map shows the change in the percent of land in farms over the preceding year in each Wisconsin county from 1970 to 1999. This is a percent change of a percent– for example, a change from 50% land in farms to 48% would be shown on the map as a -4% change ( 48 is 96% of 50), not a 2% change.”

It’s as though I deliberately chose the most complicated variable possible, probably in an effort to confuse the TA into giving me a good grade. Though I clearly realized the potential confusion (hence the descriptive example), what I didn’t even think about until now is that the exact same thing could have been mapped without any of the “percent of a percent” nonsense. The percent of land in farms for a given year is the total land area in farms divided by the total land area of the county. The percent change that I mapped is that percent for the first year minus the percent for the next year, divided by the percent for the first year. But ignoring, say, erosion on the lakeshores (and I’m sure this data set did ignore it), the total land area doesn’t change from year to year. So total land area magically cancels out of the whole equation, and it would be mathematically equivalent, and a lot clearer to the reader, to simply show the percent change in agricultural land area. I haven’t taken a math class since high school, and maybe it’s starting to show.

It's mathemagical!

It's mathemagical!

Beyond that, out of all the options this choice seems like a particularly strange thing to map in an animation. The whole purpose of an animated map is to show change over time. But the data are already showing change, so now we’re dealing with change in change. Watch the animation; do you get anything out of it? I sure don’t. Yes, you can see that some years are calm and some are not, but it is very difficult to get a sense of the overall trend of what’s going on with farmland in Wisconsin. It would have been a lot clearer to just map the percent of land in agriculture and watch how that changed over thirty years. Now, this point is debatable because animating change maps is not unheard of. I’ve been told that some important minds and beards have investigated such animation. It can be useful for highlighting or discovering areas of instability. For general-purpose maps such as this one, however, mapping a change variable is best left to single, static instances. If I had animated just the percent of land in farms, the same trends could have been discerned through the animation, and the user would also actually learn something about the amount of agricultural land. A more appropriate use for the change map might have been a single map showing the change over the thirty year span. In fact, the subtitle here, “Change in percent land in farms 1970-1999″ could be realized by that single map.

For a final quarrel, I would argue that the counties on this map should have been labeled. There is ample space, and whereas you might get away with not labeling states in a US map, few people know Wisconsin’s 72 counties by heart. Instead of labels on the map, I forced the user to move the mouse cursor over a county to see what its name is. The rule by which I now try to abide is: don’t lean on interactivity to solve all cartographic challenges. Interactivity as a means to reveal data is a good way to add lots of additional information to a map, but it can also make it easy to be lazy. Laziness is for the map reader, not the map maker. If the information is useful and can be accommodated without relying on interaction, then do it. The specific data value you see when hovering over a county is a good use of interactivity for extra information; the county name is not. It’d be a lot less work to visually scan persistent labels that are sitting there on the map than it is to mouse over counties to see their names one at a time.

And an extra special bonus typographic nitpick: I misused hyphens in place of both an en and an em dash in the subtitle and description, respectively.

One Nice Thing: The animated and interactive features of the map are nearly—but not quite—unbreakable. I won’t mention the one bug I did find recently.

20
Aug
09

Where Does David Wilkins Live?

Remember David Wilkins, former US ambassador to Canada? Well, if you do a Google search on him, this map from whitepages.com comes up near the top, showing the distribution of telephone directory listings matching his name:

Since they apparently generate these automatically for most any name, I thought of doing my own. But, I figured that I would take another opportunity to increase the fame and internet profile of Mr. Wilkins. Can’t pass that up.

The colors are certainly less than ideal – as with so many of the maps seen here, there’s a mismatch between an orderable data set (number of listings) and an un-orderable symbology (the colors chosen to represent those numbers). Though, I suppose one can see a weak progression in the colors, depending on your perspective. But it’s still far from a good match to the data. Running from a light to a dark blue would be perfect. It would also be more friendly to people with color vision impairments.

It would also be nice if I didn’t have to assume that white means zero listings, since it could also reasonably mean “no data available.” Troubling is the fact that some of the small states are filled in with white on the main map, but on the inset, where they are enlarged, they are given a color. The inset needs to be consistent with the main map – else it makes it harder to understand that the inset is, in fact, a zoomed-in version of the main map.

A sacrifice made with a classed choropleth map like this is that you lose some precision in getting the numbers off of it. Look at the states in light blue – they all have anywhere from 1 to 11 listings for “David Wilkins.” Grouping states like this is perfectly reasonable, to help reduce the number of colors used on the map and make it easier for someone to pick out one distinct color and match it to the legend. Some ambiguity is necessary as part of this process. But, look at Texas – the only state colored in dark red. It apparently has anywhere from 43 to 53 listings. It’s the only state in its class – why is the exact number not specified?

The classification scheme in general is a bit odd. There are a few big goals you want to try and go for when deciding how to group your states. One is to minimize intra-class differences – that is, keep the class sizes small. You don’t want a class that goes 1 to 11 listings, and one that goes 12 to 500 listings. The second one is way too broad. Another is to try and make each class roughly the same size, which this map has a problem with. There’s one state in the dark red class, two in the orange class, and twenty-five in the light blue class. A third goal for class breaks is to try and have class breaks that are relatively even in number – as an astute reader points out below, the class breaks change in size just a bit, though they’re roughly pretty even, so I think they hold up pretty well. There are a few other goals, but I’ll leave it at that. As you might expect, it’s hard to fulfill all the goals at once, but the severity of the difference between 1 red state and 25 light blue ones is still pretty bad. The two lowest classes cover most of the country, and the two upper classes cover only three states. It makes those three states stand out, but more than they should. There’s not a large, unusual, and worth-pointing-out difference between the upper and lower end states, to my mind.

These data should probably be normalized, as well. Consider Texas again: a lot of people named David Wilkins live there. This is probably because a lot of people live there in the first place – it’s one of the most populous states. More populated places will probably have more people named David Wilkins. Likewise, you can’t find anyone named David Wilkins in places like Wyoming or South Dakota, because approximately no one lives in those states. The pattern shown by this map is highly correlated to the population distribution of the United States. It does not show whether or not people from Texas are more likely than people from Wisconsin to be named David Wilkins. Instead of making a map of how many telephone listings there are in each state for David Wilkins, the author(s) should plot how many listings there are for David Wilkins per million inhabitants of the state.  Then you would find out that Delaware has 8.1 listings for David Wilkins per million inhabitants, vs. only 2.2 for Texas. The name is also particularly popular in South Carolina, which state the Ambassador calls home.

I find it a bit odd that they have region names listed for New England and the Mid Atlantic, but not the rest of the country. Also, I was under the impression that Maine was part of New England.

One Nice Thing: Those inset maps to the right sure are handy.

With that, I will leave off today’s effort to make this blog the #1 item on a Google search for David Wilkins.




Follow

Get every new post delivered to your Inbox.

Join 71 other followers