24
Jan
11

No Swearing in Utah

I’ve got a map on the cover of the latest issue of Cartographic Perspectives, and some colleagues of mine have been so kind as to spread it around Twitter and Facebook and all those other popular social media which I’ve never gotten in to. It’s been a while since I’ve subjected my own work to this blog, so I thought I’d take advantage of its temporary boost in popularity in a small corner of the Internet to do so again.

Click to download PDF (~12MB)

This time, though I’d like to try an experiment. If you would be so kind, gentle readers, I would like to turn this critique over to you. This afternoon I am feeling unrealistically optimistic about the number of readers who might be willing to provide comments. If you’re so inclined, click the link above to download a PDF, and then let me know what you think. Here at Cartastrophe, my goal is to enlighten myself (and, hopefully others), through critique and analysis; anything you can add to the discussion is always welcome.

Among other things, I am particularly interested to hear thoughts on the GIS work (described in the lower left corner); I am no expert in spatial analysis, and I feel I was somewhat arbitrary in my methods. Basically, I generated a raster surface in which each pixel gave the average number of profanities for the nearest 500 tweets that could be located. This should account for variation in population density around the US. Perhaps you have a better suggestion for how to go about it. Comments, be they negative or positive, on non-GIS things are welcome, as well.

And I encourage everyone to have a look at the new issue of Cartographic Perspectiveshttp://www.nacis.org/CP/CP66/CP66.pdf. Especially if you want to hear me go on at length about reviving the historical technique of waterlining.

About these ads

112 Responses to “No Swearing in Utah”


  1. 24th January, 2011 at 3:16 pm

    I think I speak for all your readers when I say that we want to see you—in a departure from your usual polite, professional demeanor—list the profanities you looked for. Was it the traditional George Carlin set, or something more 21st century Urban Dictionary-inspired?

    (By the way: awesome map!)

    • 2 Daniel Huffman
      24th January, 2011 at 4:46 pm

      Andy,

      Thanks for the kind words. I made use of six main ones that came to mind: fuck, shit, bitch, hell, damn, ass

      I was running these through Excel (since I had the Tweet data in a spreadsheet), and I had it simply search for those text strings within the message. In most cases, that meant it was a search along the lines of *fuck*, meaning it would catch “fucker,” “fuckwit,” and other words probably bandied about the UW Cartography Lab in its saltier days. For ass and hell, I kept wildcards out of those words, since there was a chance of catching something like “assume” or “shell” if I did not.

      I tried a few other words here and there, some of Carlin derivation, and found that I wasn’t seeing too many hits. Those six seemed to be the bulk of the traffic, so I kept it simple. I also checked to see if I was catching false positives — “mishit,” for example, and didn’t see too many. I didn’t have any actual hard numerical criteria as to how many is too many, since I was making this for my own amusement.

      Don’t tell my parents about any of this.

  2. 24th January, 2011 at 4:43 pm

    Hi Daniel.

    Thanks for preparing this map for the special digital issue of Cartographic Perspectives–I think it is really intriguing. I definitely would like to hear you answer Andy’s question, and perhaps also speak to any regional variations in swear usage.

    Cartographically, I am interested in your decision to make the majority of the page darker in tone. Traditional design tenets argue for a 1:1:1 split in the space allotted to white/light, medium, and black/dark tones. I am not arguing against your decision per se (I think it works on my ultra bright monitor, I am unsure if it would hold up in print), but I am curious if you could speak to some of the advantages/disadvantages of the approach, and why you ultimately made the design choice that you did.

    Keep making great maps!
    Rob

    • 7 Daniel Huffman
      24th January, 2011 at 5:01 pm

      Hey Rob,

      With a data set like this, you could spend years making maps about language. Regional swearing would be a good one, or even any other regional dialect elements. You’d get a few tweets from people who are just visiting the location they tweet from, but I am operating on the assumption that most of the tweets are from people who reside in the area.

      I’ve been interested in working in darker tones over the past year or two. That’s partially Tanya Buckingham’s influence — she pushed me to make my European buildings map map darker. It began as very much a choice of personal aesthetic. I think I had in mind the old computers of my younger days (oh so long ago, now that I’m nearly the ancient age of 30), with the green text on a black background. Green doesn’t work so great in print, but you can do a nice job with red. I think it carries a sort of technological connotation, that we’re seeing a sort of ethereal digital realm apart from the physicality of the earth, the magical twitterverse. It also makes me think of a heads up display, a bit. That sort of realm of things. It seems fitting to the subject.

      It does limit the print some, though. You’ll need a pretty good room light source to see the distinctions in color well. I brightened the most recent draft on that account. It’s not meant for any specific application, merely my amusement, so I’m not too concerned about that limitation, but it is a challenge of working dark.

      Someone has probably done a study on this, but I believe it’s somewhat easier to see things coming out from a dark background than a light one. In my Instrumental Analysis class during my days as a student of chemistry, we talked about different types of spectroscopy (UV vs. fluorescence), and the example I was given then which makes a good deal of sense is: Imagine a full football stadium at night. Everyone turns out a flashlight. Then, 20 people turn them off. It’s hard to see 20 seats darken against thousands of lights. Now imagine everyone turns their lights off and 20 people turn them on. It’s easy to spot those 20 people. It’s easy to see 20 lights against the darkness. Same thing in print, I suspect. A black line on a white background is an absence of light amongst a huge light field. A light line on a black background is a few lights in the darkness.

      I am not aware of the argument in favor of the 1:1:1 split, but perhaps I should familiarize myself with them. I can see a potential disadvantage to my system in that everything must be brighter than the black. That lends a sort of implied order to things — brighter is more, which makes sense here but would not always. With a medium background, you have spaces in the visual hierarchy which are both darker and lighter, which may avoid that. Not sure how that would really play out in reality — I suspect it’s something one can easily design around, but it might be present.

      Okay, I’ve gone on enough.

      • 8 Tina
        12th February, 2011 at 9:44 am

        I had the same question about the gradients and dark/light balance, in part because the darker parts along the U.S. border blended too well with Canada and Mexico. Then I realized that I turn my laptop monitor down to half when I’m running off battery, so I jacked the brightness back up and then I adjusted the angle of my screen so that it tilted more towards me (a bit awkward for a laptop, but perhaps not an issue if this is in print), and lo! the 49th parallel re-emerged as distinct.

        I’d also be interested to know if the seeming correlation between urban areas and the high(er) levels of swearing is meaningful – is there something causal, or is it just correlative? The East Coast seems to have been collectively pissed off about something in April and May, but Oklahoma City Tweets seem to be much less profane. (In contrast, I’d swear pretty much constantly if I lived there.)

        Overall, though, I like its elegance. To me, this is easy enough to read that it’s almost soothing, despite being red rather than a traditional “cool” color.

  3. 9 Greg
    24th January, 2011 at 7:02 pm

    Nice job on the map and the idea behind it. I also enjoyed the watermarks article.

    I would like to see more landmarks shown on the map, especially for the extreme peaks and valleys. More labeled cities in those extreme cases would be nice. Murfreesboro, TN appears to be a good example of a peak. Murfreesboro is a relatively small city with a large state university, MTSU, and a relatively large number of students, which could explain more swearing. I wouldn’t mind more editorial labeling with anything that could cause more swearing, like schools, prisons, etc.

    The lines in the less-populous western states don’t make as much sense as the ones in the eastern states. I would expect any large city (like Phoenix) to appear as a peak or valley, or some kind of focus point. The lack of focus on large western cities appears to reflect a lack of data for those areas surrounded by very low population densities. Could you easily collect data for a year? Facebook posts would probably give a more accurate measure. I understand that you probably can’t collect those easily.

    • 10 Daniel Huffman
      25th January, 2011 at 2:52 pm

      Hi Greg,

      Thanks for the kind words. I think you have a fair point about more labels to help bring some of the story into context — Murfreesboro being a great example.

      I had to collect these data in realtime, so a year would be along wait =). But there’s no particular reason why I could not other than the wait and the demands on computer time.

      As to the cities of the west, the lines there are a bit more generalized, you are correct. The lines are based on a local search for the nearest 500 tweets made. Out west, tweets are fewer and farther between, and so what that means is that the algorithm has to search sometimes about 100 miles to find enough data to work with. We’re speaking in much broader strokes on that side of the map.

  4. 11 kg
    24th January, 2011 at 10:31 pm

    Very slick design. I wonder if it would work printed. I like the color scheme and the textured contour lines at 5 and 10. They brought my eye back to the key, which helped me comprehend the data better.

    It is a novel and interesting way to portray the data. You have made an interesting map that shows the geographic importance of this variable. The trends tell us a lot. There is a wholesome track running from Utah all the way to Wisconsin until it hits Sodom and Gomorrah (Milwaukee and Chicago, respectively).

    While I am open to the idea, I am not convinced that a contour map is appropriate for this data set. My guess is that the data would be more discrete than this map would lead us to believe. Look at northern California and Portland-Seattle. Are the pottymouths mostly in the suburbs or is that an effect of proximity and how you analyzed the data?

    While it might be more boring and certainly less innovative, I would be interested to see a choropleth using the same raster data. H-e-double-hockey-sticks, you could even use the same color scheme!

    What about “fuk,” “dam,” and other such variants? I would love to see a model of the distribution of the use of “*douche*” across the US.

    *nitcpicky question: Your city labels follow the arc of the horizontal state borders. I think it works well, but I’d love to hear you discuss why you chose to do that.

    • 12 Daniel Huffman
      25th January, 2011 at 2:56 pm

      The print’s a bit troublesome. You need some good direct lighting to see it well, but if you can meet that requirement it looks fine.

      I think to draw suburb/city distinctions I’d need to collect more data in order to be able to sustain a larger map scale. I had to have the computer search for the nearest 500 tweets per raster cell, and so it doesn’t distinguish between suburb and city if there’s no enough data to keep them separate. If there’s a lot of data around, you could get data on 500 city tweets and 500 suburb tweets just fine and create a detailed surface. I think a choropleth could work, too, but only if we’ve got enough points in each enumeration unit. Probably need to collect for a few more months.

      The labels curve along the (invisible) graticule. It’s a convention I was taught once but haven’t used in a good while. I think it integrates them better into the map — the geography is subject to the curving influences of the conic projection, and the labels are, too, rather than standing apart. It makes it feel more like they’re written upon the surface of the earth. I think it’s more harmonious.

  5. 13 amac
    25th January, 2011 at 12:09 am

    what cell size was the raster you used? Off the top of my head it seems that if there are more than 500 tweets from within the area of the cell you are assuming that the distribution of profanity within the area of a cell is random and that by sampling the center of the cell(or whatever coordinates you used to determine proximity) you were getting a representative sample for the cell. It also seems like you might be smoothing out the data, particularly in tweet-sparse areas(Death Valley Wilderness Area?). it seems likely that there are some areas that have less than 500 tweets from within the area of the cell and you would be including data from a much broader range, in effect filling a depression.

    I don’t know enough about the data to suggest another method that could actually have been applied as I don’t work with this kind of data and don’t use twitter, but ideally I would calculate all of the (# of tweets with swears)/(# of tweets)–or (# of swears)/(# of tweets), depending–that came from within the area of the cell. I’m guessing that the location information is not precise enough to do this, but perhaps most people aren’t as wonky about broadcasting precise location information I am. Another possibility to deal with the smoothing effect in tweet-sparse areas would be to set a maximum search radius so that values from too far away(however you want to define that) are not included. Basically, I think the fact that the area you are calculating density of swears over varies based on population is problematic. I think the area should be consistent and then normalize the number of swears by the number of tweets.

    • 14 Daniel Huffman
      25th January, 2011 at 3:07 pm

      I believe my cells were on the order of 1 or 2km — it’s been a while and I don’t have my notes handy. It was, in any case, pretty small given the size of the area covered. I am most certainly smoothing out the data, as most cells would not have had 500 tweets. Instead, each cell is filled with data from the 500 tweets nearest to the center, regardless of whether or not they’re in the cell. The plus side to this is that every spot on the map is based on an equal number of tweets — out in the sparsely populated West, I can get a population-independent assessment of profanity frequency even though few people live there. I just have to search 50-100 miles from that cell location in order to get enough neighbors to calculate a frequency. So, the upside is that the surface isn’t just a population density map, with more tweets in locations with more people. The downside is that each part of the surface is based on a differently-sized geographic area. There’s a lot more detail possible in New York City, where you can get 500 tweets in a square mile, than there is in Utah, where we have to pull in some data from surrounding areas.

      I think you are quite right in that this is problematic, though I’m not sure the alternative is any better. I don’t have enough data to support my analysis in the sparse areas. I could leave the off the map, certainly, but that’s a bit limiting (and, this was made primarily for amusement, so it’s not being used for any important decision-making). I think a maximum search radius might help alleviate this some, though it’s hard to decide what’s an appropriate radius (and to pick one that still gives me enough tweets to calculate a robust percentage of profanities).

      Your point is well taken about how the tweets are sampled within a cell, and I admit to not thinking in great detail about that. In theory, some tweets might not be sampled at all, that means. My cell sizes, I hope, are small enough that this shouldn’t make a really significant difference, but if I were to re-do it, I think I would have to reconsider how I go about things, to make sure that if a cell has enough data, it samples all of the tweets no matter how many, and then starts searching only if there aren’t 500 (or whatever threshold I set).

      Also, you are right in that I was smoothing the data — beyond the GIS smoothing, I did a bit of manual smoothing in Illustrator. I didn’t want the lines looking too detailed and precise, since they are based on only a sample of tweets, and only on my opinion of profanity, and on one questionable type of spatial analysis. I am hoping people will take it more generally.

      Thanks for the thoughts! It makes me want to dig out the data and re-do things =).

      • 15 amac
        25th January, 2011 at 6:34 pm

        That was about the cell size I had assumed. I think the advantage of calculating a profanity:tweet ratio for each cell is that the data becomes population independent. I guess that is basically what you were doing, except the # of tweets is constant, but i don’t think it needs to be; you could carry the math through and just map the ratio instead of the number of swears. In some areas that would probably mean you have no data, but that in and of itself is a valuable piece of information.

        Another possibility(that is really not practical for this kind of project) might be to do a stratified random sampling instead of a straight random sampling of the original data. That way you’d get more data points to work with in sparse areas, but wouldn’t have to deal with the whole dataset. The amount of time to process such a large dataset into the appropriate geographic bins could be a big headache, I’ve never tried it with that much data. Ok, I’m going off the rails here.

  6. 16 Rachel
    25th January, 2011 at 2:09 pm

    Awesome map.

    If you ever feel inclined to do another one, I would really love to see a geographic distribution of just the gendered profanities. This kind of analysis had never really occurred to me before, but it’s really interesting!

    • 17 Daniel Huffman
      25th January, 2011 at 3:08 pm

      An excellent idea.

      Really, it would pretty need to be able to have access to Twitter’s whole database and just generate these all day. There’s so many lexical analyses you can do!

  7. 18 Laura M
    25th January, 2011 at 2:39 pm

    I first saw this map on the cover of Cartographic Perspectives, and found it very visually appealing. I had a chuckle when I looked inside and discovered what it was a map of. :)

  8. 19 Jen
    26th January, 2011 at 7:42 am

    I’d love to see how this compares to non-digital communications of folks not in the technorati. Only way I can think to do that is labor-intensive: security cam films, perhaps all from the same type of establishment. Maybe someone else will suggest something better.

    • 20 Daniel Huffman
      26th January, 2011 at 10:59 am

      Hi Jen,

      That’s probably the biggest limitation of this sort of data set. I, myself, am not really one of the technorati, and so my propensity for swearing would not help light up Madison on this map =). The Internet has become a really great way to get lots of data on people without doing intensive surveys, but it’s always good to remember that we’re only looking at a subset of people — those with the time and money for luxury.

  9. 21 mirele
    26th January, 2011 at 11:52 am

    I think this map’s inaccurate because it doesn’t take into account the propensity of people in Utah (especially teenage males) to use swearing euphemisms (aka “minced oaths”) in conversation. These would include words such as “flip” and “fetch” for the “f” word. It’s actually a joke in Utah and other Mormon communities. No, people don’t swear, but they do. :)

  10. 22 byl046
    26th January, 2011 at 3:39 pm

    I have been to Utah many times and there is indeed swearing, although not much in Provo. The reason there is no swearing in eastern Utah is because nobody actually lives there. Seriously.

  11. 23 John V
    26th January, 2011 at 4:25 pm

    Do you have data for AK and HI?

    It’s not your fault that it’s hard to include us, but it rankles EVERY SINGLE TIME.

    Thanks!
    -John

    • 24 Daniel Huffman
      26th January, 2011 at 4:37 pm

      Hi John,

      I do indeed have data for the Forgotten Two, somewhere in the raw files of the project. This map was originally for my own amusement and as a sort of proof of concept, and so I didn’t go to the extra effort to include them, unfortunately. If I had known this map would gain some popularity, I would have made sure to do so. I am sympathetic to your plight.

  12. 25 Kenny Umenthum
    26th January, 2011 at 4:40 pm

    Hello,

    I love seeing data like this. I was recently intriqued by a public radio segment about a man who studied dialects based on geo-tweeting – much like one commenter mentioned. My problem with this data is that it appears too similar to population density maps; the east coast, west coast, and sun belt seem to be very active, as one would expect. I would like to see a per capita function added into your data analysis, if this is at all possible. If you have already done this, then my fellow mid-westerners are more proper than I would have expected!

    Thanks for the great data!

    Kenny Umenthum

    • 26 Daniel Huffman
      26th January, 2011 at 4:48 pm

      Hi Kenny,

      Thanks for the kind words. The map accounts for population density already — or, rather, for density of tweets. This is a map of # of profanities divided by # of tweets. If you look through some of the comments above, or read the lower left corner of the PDF map, I go into a bit of technical detail about how I did this.

      Good to hear that other people are doing these kind of analyses. I think there’s a lot of potential for people interested in studying dialects.

      • 27 Kenny Umenthum
        26th January, 2011 at 9:37 pm

        Oh, good! Disregard my comment then. Sorry I didn’t do my research before opening my mouth.
        I failed to mention, the study I heard about analyzed slang terms that were localized -in tweets at least- to certain regions, or even certain cities. Perhaps you could expand your research using a similar venture to discover regional slang substitutes, like the euphamisms ‘mirele’ suggested about the teenage boys in Utah.

        Happy tweet-hunting!

  13. 28 Misplaced Texan
    26th January, 2011 at 5:04 pm

    Austin, Texas looks oddly polite.

  14. 26th January, 2011 at 8:21 pm

    It’s been a long time since I’ve seen that much rubylith.

    I’ll make the Tufte criticism: too much ink, not enough data. The bandwidth could be better used to free the data.

    Base layer: scatter dots of the observations, where are the tweets?
    Intensity layer: 0 2 4 8 16 per 100, 5 colors, how foul?
    Isoline layer: lines, surround what can be surrounded in the area in which there is data

    It’s a big country, mostly vacant; some huge percentage live within 100 miles of the two borders and the coast. To have data points everywhere cell sizes would have to be too large to show any detail at all within the inhabited area. Better, I propose, to leave the empty quarters blank than to over interpolate.

    Notwithstanding my preferences, it’s a stunningly well realized implementation of the design selected.

  15. 26th January, 2011 at 9:52 pm

    Love the map! I actually write a blog about social media in Utah. I love what you’ve done here and just wrote a blog post about it. I definitely gave you the credit. Thanks for sharing this awesome project. http://www.thehungryhive.com/2011/01/no-swearing-in-utah-at-least-on-twitter/

    • 26th January, 2011 at 9:56 pm

      And believe it or not, I think this map is quite accurate. While some may say that Utah doesn’t have the population density to provide an accurate sample, Utah is actually the fastest growing market for Twitter and Salt Lake is listed as one of the top 10 most social media savvy cities in the country. So yes, we do use social media. But for some reason we don’t let the expletives fly (hell if I know why) http://www.thehungryhive.com/about-2/

      • 32 Daniel Huffman
        26th January, 2011 at 10:01 pm

        Hi Bryson,

        Thanks for the kind words. I’ve heard a lot about Utah of late, and its propensity for substitute profanities like “fetch.” I used to work with a graduate of Utah State here at UW-Madison, and he had to explain what such words meant to us more foul-mouthed midwesterners =).

  16. 33 Steve D
    27th January, 2011 at 7:03 am

    I’ve looked at numerous sites presenting this map and not a single one tells me HOW MANY DATA POINTS THERE ARE. If you have a million tweets, that might be significant. 5,000, meh. This has the look of a sparse data set. I’d also suggest that if you can’t get enough tweets within, say, 10 miles of a point that you color it “insufficient data.” That way we can tell if SE Utah is very polite or just empty (more likely the latter). Also, with 16 million colors available, why only tones of dark red? It makes the map drab and hard to read.

    • 34 Daniel Huffman
      27th January, 2011 at 9:20 am

      Hi Steve,

      There are about 1.5 million tweets used in generating this surface — I think a couple of sites mention this fact, but not most of them. I think it’s also mentioned on the map itself if you read the details in the corner which described the analysis done. As to the color choice, I liked the red. To me personally, it’s a very attractive aesthetic that reminds me a little bit of the old green-on-black computer monitors. Except, bright green is a bit harder to work with in print, so I switched over to red. Legibility was less of a consideration, as I did not design the map for an audience. If individual areas are hard to read, but the larger patterns of brightness stand out, that may be advantageous, as it keps people from focusing too much on details.

    • 35 Daniel Huffman
      27th January, 2011 at 9:45 am

      To append my earlier reply: I just now remembered another thought I had in mind when designing this (it’s been a few months). I wanted the differences between classes to be relatively small, and there to be a sparsity of labels, specifically to ensure that you couldn’t easily go to your house and say, “oh, there’s 5 swears/100 tweets over here” and then compare to your friend’s city and say “oh, it’s 6 over here.” Since this is based on smoothing a bunch of sample data, if I did the analysis differently, or just collected in different months, those numbers are likely to shift a fair bit, so I didn’t want people too invested in thinking about them. Just the overall patterns, which are likely more robust.

      Thanks for the comments!

  17. 36 A mom
    27th January, 2011 at 9:14 am

    Gee! Thanks for printing the swear words. One would think The Atlantic reader sophisticated enough to know and a writer savvy enough to know better.

    • 27th January, 2011 at 3:17 pm

      Knowing the swear words is important when assessing the methodology used in creating the data set. The OP didn’t post them initially, they were requested. Since I read the list I now have to consider that, since he included “hell” in the list, tweets by fire and brimstone like preachers are now going to show up as profanities.

      Awesome.

      Whats The Atlantic?

  18. 27th January, 2011 at 10:04 am

    Is this adjusted for general population and internet access? I think this map may strongly correlate with the population that has access to Twitter, rather than indicating who’s swearing most.

    • 39 Daniel Huffman
      27th January, 2011 at 11:13 am

      Hi Emily,

      It is a map of profanities divided by tweets, generally. So, a dark area means fewer profanities per tweet, not fewer tweets overall. That being said, there’s still an issue of population going on here, in that I can only gather data from people who a) use Twitter, b) have a smartphone that can code their location. So, it’s perhaps not reflective of the general populace. I try to be clear about that on the part of the map that describes how the data were processed and how I adjusted for population.

  19. 40 Tyler Stone
    27th January, 2011 at 11:27 am

    I suspect the bright spot by Denver is from canceled flights and missed connections at DEN. :)

  20. 41 Joe
    27th January, 2011 at 11:46 am

    Did you limit the universe of tweets to only those in English? If not, e.g. more densely Hispanic regions would be diluted relative to others.

    • 42 Daniel Huffman
      27th January, 2011 at 11:49 am

      Hi Joe,

      A fine question. I did indeed. I have no idea what might be considered profanity in Spanish, so it’s limited in that way. It shouldn’t cause a dilution, per se, though, since the map still shows how likely someone is to be profane among the local English-tweeting population. If we threw in the Spanish as well, and they were cleaner in their language, then those regions would see their profanity rate go down. Hard to say what effects it would have. It would be great if someone wanted to collect some data and parse through that.

  21. 43 steve
    27th January, 2011 at 12:36 pm

    It’s a neat map, I like it.
    However, I think there’s enough selection bias that it’s use is only superficial.

    As you mention above, you can only select data from sources that a) use twitter, b) have a smartphone that codes locations.

    1.5 million tweets sounds like a big number, but in the twitter universe alone (a vanishingly small subset of the real universe, to start) that’s about a day and a half.

    Your map is clever, and it nicely fits into the widely-held stereotypes of profane coasts and quiet, non swearing midwesterners, so it’s gotten a lot of play. But I hope people don’t try to draw too many conclusions about reality therefrom, as the results of this are lost in the static when you start to realize that with the twitter crowd you’re already selecting massively for 18-30′s over all other age demos, you’re selecting for the wealthy, the technologically literate, and (frankly speaking, as one who doesn’t “get” twitter) the bored narcissists who think people care that they’re standing in line at Starbucks right now.

    I work in the trucking industry and while almost none of them care to tweet, I can assure you that middle america certainly is familiar with profanity.

    • 44 Daniel Huffman
      27th January, 2011 at 12:43 pm

      Hi Steve,

      Thanks for the kind words. I think you are exactly right about everything needing to be taken in context of the limitations of the data. I didn’t really intend for this map to get picked up by various blogs — just expected it to be seen by a hundred carto-geeks on this blog and in Cartographic Perspectives, so I didn’t put up a sort of long disclaimer and explanation that I might do if I were intending for wide distribution. That’s my fault for not be careful and clear. I’ve tried to compensate a little for the weakness of the data in the design — I smoothed out the isolines to avoid a sense of precision, and the colors are not terribly distinct in order to allow people to tell only general trends, which are probably more robust. Hopefully that helps at least a little (though it mostly seems to cause people to dislike those imprecise aspects about it, perhaps because they expect maps to be more specific and to give them truth?)

      I can only hope that most of the people who’ve seen this have thought about it as critically as you have.

      • 27th January, 2011 at 4:23 pm

        This line of thinking is kind of interesting because it demonstrates a need to be clear not only about what you are but also what you are not showing. This map is pretty plainly labeled as “profanity on Twitter,” but everybody’s first (and many people’s only) reaction is to extrapolate to profanity in all parts of life by all people. And of course it’s not that; it’s a map of profanities in a certain medium by a certain demographic group. But here you’ve ended up having to worry about not being explicit about what the map isn’t.

        Since not everyone will consider the limitations like Steve has, perhaps you’d best add a “WARNING: SUPERFICIAL” stamp over the map and any future potentially popular maps based on high volumes of everyday personal data from sources like Twitter, Facebook, Flickr, etc. (I am completely in favor of superficial maps, for the record!)

  22. 27th January, 2011 at 2:52 pm

    Love the map. The use of negative colour is something I’ve always been fascinated with but could never use since most of my maps end up being printed. Now that I’m creating more web apps you have encouraged me to go for it.

    I can’t help but think about the hot spots that show up. I live in San Diego and the first thing I noticed is the large hot spot in south eastern California. Most of the area covered by this hot spot is desert so I’m thinking that your find nearest algorithm had to travel a long way to get to the 500 tweets threshold. Big deal right, this is true of many rural areas of the country and Utah looks a model of decorum. So why would there be a hot spot in Southern California s desert? The biggest thing out there is 29 Palms, pop 30,000. But there is also a large marine base. Now I hate to cast dispersions on the Marines but I know a few of them and they are quite fluent in profanities. So without seeing the source data and using a wildly unfair generalization of a fine institution I came up with am idea.

    In areas where you have a low data point density hot spots are disproportionately magnified because large numbers of the surround cells rely on this hot spot to reach the threshold.

    If this is the case then limiting the search distance and just recording NoData would be a better approach. You could then use some variation of the moving window technique with maybe cubic convolution to interpret data in NoData cells.

    Just an idea and it all falls apart if there is no one profane cluster of data points in the region.

    Thanks for creating such an visually appealing and thought provoking map.

    Drew

    • 47 Daniel Huffman
      27th January, 2011 at 3:47 pm

      Drew,

      Thanks for the kind words. I don’t have the numbers to hand, but I think I did in some limited cases have to travel pretty far (~100mi) to get enough data points. You may be right in that it’s simply better to leave things off the map in this case. Interesting to know about the Marine base — not sure what the policy is in the USMC about tweeting, though.

      I was terribly embarrassed to post such words upfront on the main page of the blog, but relented early on, since it is important to know. I’d be interested to know if there are, in fact, fire and brimstone sermons carried on Twitter.

  23. 48 mike b.
    27th January, 2011 at 7:05 pm

    It would be interesting to see how close to inverted a similar map that used common LDS substitute swear words would be, at least for Utah and the surrounding states. (ie, Freakin’, Fudge, Shoot, Darn, Heck, Crap, Cheese n’ Rice, etc.) http://wesclark.com/ubn/swearing.html for more details.

  24. 49 Kay
    27th January, 2011 at 10:35 pm

    Hmmm….I think I’ll move to Maine! :)

  25. 50 Joe Bloggs
    28th January, 2011 at 2:01 am

    what piques my curiosity, is the hot spot at far west fringe of the OK panhandle, that has
    CO, NM, KS, and TX all within a relatively short distance.

    Looking at Google maps, I think it must be some side effect of low population density.
    Comanche Natl GrassLands, and Boise City is all I see nearby.

    But perhaps, as noted, upstream, maybe there’s a military base, within, sight unseen.

    Other aspects I would wonder on, would be the availability of 3G/4G/digital coverage;
    it’s been 5 or 6 years since I last drove through the OK panhandle, but I would
    guess that most of the coverage was analog then (give or take I70, I35, and good sized towns).

  26. 51 MT
    29th January, 2011 at 10:46 am

    Are these absolute numbers or are they scaled by population density?

    • 52 Daniel Huffman
      29th January, 2011 at 10:53 am

      Hi MT,

      They’re not absolute numbers. Instead, they’re scaled by tweet density, so what this map shows is # profanities per tweet. I go into my methodology above.

  27. 53 Sherry Young
    31st January, 2011 at 5:30 pm

    Ha Ha — I live on the western Slope in Colorado and I see the “Utah effect” clearly.

    Grand Junction through to Gypsum off I70 are heavily LDS. They have the lock on all the Boy Scout Troops etc. Just check out the towns that are mapped out the same way that Salt Lake City is — the temple is always in the center.

  28. 31st January, 2011 at 11:03 pm

    I loved the article and the map. Great job.
    My kids know if they talk trash words that the soap is coming!

  29. 55 Sally
    3rd February, 2011 at 9:01 pm

    How is this not just a map of population density? Per capita would be better or some kind of normalized statistic. Right now this looks like it’s heavily biased by population density.

    “This should account for variation in population density around the US.”
    What do you mean by “this”? How does your method remove population density bias?

    I tried reading through the existing posts before asking this. I’m sorry if I missed you discussing this already.

  30. 14th February, 2011 at 6:59 am

    A very interesting and innovative project. My only thought is the choice of colors makes it look more like red-colored mud. Perhaps a more dramatic color range would make it easier to comprehend?

    Still, good work, and good thinking. The next step might be to expand the data set with more than just tweets. Could there be a profanity bias among tweet users trying to make an impression in 144 characters?

    I wonder if there’s a government grant in this for you? The same techniques could be used to monitor for seditious words and find which areas or even people need to be more closely watched. They could enlist Google for that project. Google already knows too much about us, so this would be easy for them.

  31. 58 Digitus Impudicus
    17th February, 2011 at 2:55 pm

    I find the map almost unreadable on my screen. I varies from dark red to darker red. I can barely make out any of the city names. The variations in color are hard to distinguish. Was that the intent?

    • 59 Daniel Huffman
      17th February, 2011 at 3:03 pm

      Yeah, it’s not the best on most output devices — the colors were more because I wanted to play around with those tones, without much intent for mass distribution, as accidentally happened. There was a bit of intent to get the colors to blend together, on account of the fact that I didn’t want people to take it too seriously. I wrote more about that on my other blog, somethingaboutmaps.wordpress.com

  32. 60 Eileen
    18th February, 2011 at 10:41 am

    Daniel,
    I have to thank boinboing.net for directing me toward both of your blogs. I’m thoroughly enjoying your work, especially the brilliant subway-style watershed maps. The Profane Mountains, Polite Plains map is great. My one critique is the red gradation. I like the red, the red is fine. It’s the use of the dark ominous red in polite regions and a lighter red in profane regions. Even after looking at the legend I kept having to correct myself from believing the deep dark areas, like a dark back alley, were profane. When I started to explain the map to my boyfriend, he chimed in, “So the dark areas are profane?” Dark colors, especially red have a negative, even ominous connotation. I would suggest inverting the red gradation to convey the information more intuitively.
    Thanks for the awesome work!

  33. 62 The Grinch
    25th May, 2011 at 9:28 pm

    Just looking at it quick, it looks like there might be some correlation with high manufacturing density? A bunch of guys working in close quarters might develop the ‘sailors mouth’ that would become locally acceptable even after the heavy manufacturing base starts to leave.

  34. 26th August, 2011 at 1:11 pm

    Hello! I’ve been reading your web site for a while now and finally got the bravery to go ahead and give you a shout out from Atascocita Tx! Just wanted to tell you keep up the good work!

  35. 64 exosus
    27th October, 2011 at 2:05 pm

    I like the map, very interestinf info, but the download link seems to be broken so I can’t really get a good look at it :(

    • 65 Daniel Huffman
      28th October, 2011 at 10:56 am

      Hmm. I’ve had some intermittent problems with that link, too. I will see about replacing it with something more stable in a few days.

      >________________________________

  36. 66 epoyjun
    2nd December, 2011 at 6:41 am

    I really love reading your comments guys.I learn a lot from you.

    Thank you so much

    Regard,

    Epoyjun


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 68 other followers

%d bloggers like this: