Cardiovascular Disease / Health / Mortality

Tweeting our Happiness and Health

Picture taken from PLoS One. 2013; 8(5): e64417

Average word happiness for geotagged tweets in all US states collected during calendar year 2011. The happiest 5 states, in order, are: Hawaii, Maine, Nevada, Utah and Vermont. The saddest 5 states, in order, are: Louisiana, Mississippi, Maryland, Delaware and Georgia.

I previously wrote about the novel use of Twitter to predict geographic happiness, citing a study that used a geo-tagged data set of over 80 million words generated in 2011 on Twitter to estimate the happiest and saddest states (shown in red and blue, respectively, on the map). Authors describe their methodology as follows: “To measure sentiment (hereafter happiness) in these areas from the corpus of words collected, we use the Language Assessment by Mechanical Turk (LabMT) word list…assembled by combining the 5,000 most frequently occurring words in each of four text sources: Google Books (English), music lyrics, the New York Times and Twitter. A total of roughly 10,000 of these individual words have been scored by users of Amazon’s Mechanical Turk service on a scale of 1 (sad) to 9 (happy), resulting in a measure of average happiness for each given word … For example, ‘rainbow’ is one of the happiest words in the list with a score of An external file that holds a picture, illustration, etc.
Object name is pone.0064417.e002.jpg, while ‘earthquake’ is one of the saddest, with An external file that holds a picture, illustration, etc.
Object name is pone.0064417.e003.jpg. Neutral words like ‘the’ or ‘thereof’ tend to score in the middle of the scale, with An external file that holds a picture, illustration, etc.
Object name is pone.0064417.e004.jpg and 5 respectively.

Recently I saw an even more exciting application of this concept: the use of psychological data on Twitter to estimate heart disease risk in various geographical areas. As study authors note, important psychological risk factors for cardiovascular disease, such as hostility and stress, are very difficult to measure at a community level. Typical measurements involve phone surveys and/or household visits to gather psychological data, but these are costly and imprecise. So, authors turned to Twitter, using data from 1,347 U.S. counties with published heart disease and mortality statistics and at least 50,000 available tweeted words. The study area included more than 85% of the U.S. population. Analyses showed indeed that “language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors.”

Map of counties in the northeastern United States showing age-adjusted mortality from atherosclerotic heart disease (AHD) as reported by the Centers for Disease Control and Prevention (CDC; left) and as estimated through the Twitter-language-only prediction model (right).

Map of counties in the northeastern United States showing age-adjusted mortality from atherosclerotic heart disease (AHD) as reported by the Centers for Disease Control and Prevention (CDC; left) and as estimated through the Twitter-language-only prediction model (right).

More importantly, “a cross-sectional regression model based only on Twitter language predicted AHD [atherosclerotic heart disease] mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking, diabetes, hypertension, and obesity.” The included visual demonstrates true mortality rates from AHD vs. predicted heart disease mortality rates from Twitter language. Authors comment that since the typical Twitter user is younger and is NOT  the typical person at risk for/dying of heart disease, it is unclear why Twitter language should track heart disease mortality.  They surmise that “the tweets of younger adults may disclose characteristics of their community, reflecting a shared economic, physical, and psychological environment…the language of Twitter may be a window into the aggregated and powerful effects of the community context.”

It’s fascinating, isn’t it? That those 140 character emotive blasts, which we tweet without consideration or conscious debate, could collectively represent our community-wide psychological health and heart disease risk?

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s