• Search Menu
  • Advance articles
  • Featured Content
  • Author Guidelines
  • Open Access
  • About The British Journal of Criminology
  • About the Centre for Crime and Justice Studies
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Terms and Conditions
  • Journals on Oxford Academic
  • Books on Oxford Academic

Issue Cover

Article Contents

Introduction, prevalence of online hate speech on social media, theoretical framework, data and methods.

  • < Previous

Hate in the Machine: Anti-Black and Anti-Muslim Social Media Posts as Predictors of Offline Racially and Religiously Aggravated Crime

  • Article contents
  • Figures & tables
  • Supplementary Data

Matthew L Williams, Pete Burnap, Amir Javed, Han Liu, Sefa Ozalp, Hate in the Machine: Anti-Black and Anti-Muslim Social Media Posts as Predictors of Offline Racially and Religiously Aggravated Crime, The British Journal of Criminology , Volume 60, Issue 1, January 2020, Pages 93–117, https://doi.org/10.1093/bjc/azz049

  • Permissions Icon Permissions

National governments now recognize online hate speech as a pernicious social problem. In the wake of political votes and terror attacks, hate incidents online and offline are known to peak in tandem. This article examines whether an association exists between both forms of hate, independent of ‘trigger’ events. Using Computational Criminology that draws on data science methods, we link police crime, census and Twitter data to establish a temporal and spatial association between online hate speech that targets race and religion, and offline racially and religiously aggravated crimes in London over an eight-month period. The findings renew our understanding of hate crime as a process, rather than as a discrete event, for the digital age.

Hate crimes have risen up the hierarchy of individual and social harms, following the revelation of record high police figures and policy responses from national and devolved governments. The highest number of hate crimes in history was recorded by the police in England and Wales in 2017/18. The 94,098 hate offences represented a 17 per cent increase on the previous year and a 123 per cent increase on 2012/13. Although the Crime Survey for England and Wales has recorded a consistent decrease in total hate crime victimization (combining race, religion, sexual orientation, disability and transgender), estimations for race and religion-based hate crimes in isolation show an increase from a 112,000 annual average (April 13–March 15) to a 117,000 annual average (April 15–March 17) ( ONS, 2017 ). This increase does not take into account the likely rise in hate victimization in the aftermath of the 2017 terror attacks in London and Manchester. Despite improvements in hate crime reporting and recording, the consensus is that a significant ‘dark figure’ remains. There continues a policy and practice need to improve the intelligence about hate crimes, and in particular to better understand the role community tensions and events play in patterns of perpetration. The HMICFRS (2018) inspection on police responses to hate crimes evidenced that forces remain largely ill-prepared to handle the dramatic increases in racially and religiously aggravated offences following events like the United Kingdom-European Union (UK-EU) referendum vote in 2016 and the terror attacks in 2017. Part of the issue is a significant reduction in Police Community Support Officers throughout England, and in particular London ( Greig-Midlane (2014) indicates a circa 50 per cent reduction since 2010). Fewer officers in neighbourhoods gathering information and intelligence on community relations reduces the capacity of forces to pre-empt and mitigate spates of inter-group violence, harassment and criminal damage.

Technology has been heralded as part of the solution by transforming analogue police practices into a set of complementary digital processes that are scalable and deliverable in near real time ( Williams et al. , 2013 ; Chan and Bennett Moses, 2017 ; Williams et al. , 2017a ). In tandem with offline hate crime, online hate speech posted on social media has become a pernicious social problem ( Williams et al. , 2019 ). Thirty years on from the Home Office (1989) publication ‘ The Response to Racial Attacks and Harassment ’ that saw race hate on the streets become priority for six central Whitehall departments, the police, Crown Prosecution Service (CPS) and courts ( Bowling, 1993 ), the government is now making similar moves to tackle online hate speech. The Home Secretary in 2016 established the National Online Hate Crime Hub, a Home Affairs Select Committee in 2017 established an inquiry into hate crime, including online victimization, and a review by the Law Commission was launched by the prime minister to address the inadequacies in legislation relating to online hate. Social media giants, such as Facebook and Twitter, have been questioned by national governments and the European Union over their policies that provided safe harbour to hate speech perpetrators. Previous research shows hate crimes offline and hate speech online are strongly correlated with events of significance, such as terror attacks, political votes and court cases ( Hanes and Machin, 2014 ; Williams and Burnap, 2016 ). It is therefore acceptable to assume that online and offline hate in the immediate wake of such events are highly correlated. However, what is unclear is if a more general pattern of correlation can be found independent of ‘trigger’ events. To test this hypothesis, we collected Twitter and police recorded hate crime data over an eight-month period in London and built a series of statistical models to identify whether a significant association exists. At the time of writing, no published work has shown such an association. Our models establish a general temporal and spatial association between online hate speech targeting race and religion and offline racially and religiously aggravated crimes independent of ‘trigger’ events . Our results have the potential to renew our understanding of hate crime as a process, rather than a discrete event ( Bowling, 1993 ), for the digital age.

Since its inception, the Internet has facilitated the propagation of extreme narratives often manifesting as hate speech targeting minority groups ( Williams, 2006 ; Perry and Olsson, 2009 ; Burnap and Williams, 2015 , 2016 ; Williams and Burnap, 2016 ; Williams et al. , 2019 ). Home Office (2018) data show that 1,605 hate crimes were flagged as online offences between 2017 and 2018, representing 2 per cent of all hate offences. This represents a 40 per cent increase compared to the previous year. Online race hate crime makes up the majority of all online hate offences (52 per cent), followed by sexual orientation (20 per cent), disability (13 per cent), religion (12 per cent) and transgender online hate crime (4 per cent). Crown Prosecution Service data show that in the year April 2017/18, there were 435 prosecutions related to online hate, a 13 per cent increase on the previous year ( CPS, 2018 ). These figures are a significant underestimate. 1 HMICFRS (2018) found that despite the Home Office introducing a requirement for police forces to flag cyber-enabled hate crime offences, uptake on this practice has been patchy and inconsistent, resulting in unreliable data on prevalence.

Hawdon et al. (2017) , using representative samples covering 15- to 30-year-olds in the United States, United Kingdom, Germany and Finland, found on average 43 per cent respondents had encountered hate material online (53 per cent for the United States and 39 per cent for the United Kingdom). Most hate material was encountered on social media, such as Twitter and Facebook. Ofcom (2018b) , also using a representative UK sample, found that near half of UK Internet users reported seeing hateful content online in the past year, with 16- to 34-year-olds most likely to report seeing this content (59 per cent for 16–24s and 62 per cent for 25–34s). Ofcom also found 45 per cent of 12- to 15-year-olds in 2017 reported encountering hateful content online, an increase on the 2016 figure of 34 per cent ( Ofcom, 2018a ; 2018c ).

Administrative and survey data only capture a snapshot of the online hate phenomenon. Data science methods pioneered within Computational Criminology (see Williams and Burnap, 2016 ; Williams et al. , 2017a ) facilitate a real-time view of hate speech perpetration in action, arguably generating a more complete picture. 2 In 2016 and 2017, the Brexit vote and a string of terror attacks were followed by significant and unprecedented increases in online hate speech (see Figures 1 and 2 ). Although the production of hate speech increased dramatically in the wake of all these events, statistical models showed it was least likely to be retweeted in volume and to survive for long periods of time, supporting a ‘half-life’ hypothesis. Where hate speech was retweeted, it emanated from a core group of like-minded individuals who seek out each other’s messages ( Williams and Burnap, 2016 ). Hate speech produced around the Brexit vote in particular was found to be largely driven by a small number of Twitter accounts. Around 50 per cent of anti-Muslim hate speech was produced by only 6 per cent users, many of whom were classified as politically anti-Islam ( Demos, 2017 ).

UK anti-black and anti-Muslim hate speech on Twitter around the Brexit vote

UK anti-black and anti-Muslim hate speech on Twitter around the Brexit vote

Global anti-Muslim hate speech on Twitter during 2017 (gaps relate to breaks in data collection)

Global anti-Muslim hate speech on Twitter during 2017 (gaps relate to breaks in data collection)

The role of popular and politically organized racism in fostering terrestrial climates of intimidation and violence is well documented ( Bowling, 1993 ). The far right, and some popular right-wing politicians, have been pivotal in shifting the ‘Overton window’ of online political discussion further to the extremes ( Lehman, 2014 ), creating spaces where hate speech has become the norm. Early research shows the far right were quick to take to the Internet largely unhindered by law enforcement due to constitutional protections around free speech in the United States. The outcome has been the establishment of extreme spaces that provide a collective virtual identity to previously fragmented hateful individuals. These spaces have helped embolden domestic hate groups in many countries, including the United States, United Kingdom, Germany, the Netherlands, Italy and Sweden ( Perry and Olsson, 2009 ).

In late 2017, social media giants began introducing hate speech policies, bowing under pressure from the German government and the European Commission ( Williams et al. , 2019 ). Up to this point, Facebook, Instagram, YouTube and Twitter were accused of ‘shielding’ far right pages as they generated advertising income due to their high number of followers. The ‘Tommy Robinson’ Facebook page, with 1 million followers, held the same protections as media and government pages, despite having nine violations of the platform’s policy on hate speech, whereas typically only five were tolerated by the content review process ( Hern, 2018 ). The page was eventually removed in March 2019, a year after Twitter removed the account of Stephen Yaxley-Lennon (alias Tommy Robinson) from their platform.

Social media was implicated in the Christchurch, New Zealand extreme-right wing terror attack in March 2019. The terrorist was an avid user of social media, including Facebook and Twitter, but also more subversive platforms, such as 8chan. 8chan was the terrorist’s platform of choice when it came to publicizing his live Facebook video of the attack. His message opened by stating he was moving on from ‘shit-posting’—using social media to spread hatred of minority groups—to taking the dialogue offline, into action. He labelled his message a ‘real life effort post’—the migration of online hate speech to offline hate crime/terrorism ( Figure 3 ). The live Facebook video lasted for 17 minutes, with the first report to the platform being made after the 12th minute. The video was taken down within the hour, but it was too late to stop the widespread sharing. It was re-uploaded more than 2 million times on Facebook, YouTube, Instagram and Twitter and it remained easily accessible over 24 hours after the attack. Facebook, Twitter, but particularly 8chan, flooded with praise and support for the attack. Many of these posts were removed, but those on 8chan remain due to its lack of moderation.

Christchurch extreme right terror attacker’s post on 8chan, broadcasting the live Facebook video

Christchurch extreme right terror attacker’s post on 8chan, broadcasting the live Facebook video

In the days following the terror attack spikes in hate crimes were recorded across the United Kingdom. In Oxford, Swastikas with the words “sub 2 PewDiePie” were graffitied on a school wall. In in his video ahead of the massacre, the terrorist asked viewers to ‘subscribe to PewDiePie’. The social media star who earned $15.5 million in 2018 from his online activities has become known for his anti-Semitic comments and endorsements of white supremacist conspiracies ( Chokshi, 2019 ). In his uploaded 74-page manifesto, the terrorist also referenced Darren Osborne, the perpetrator of the Finsbury Park Mosque attack in 2017. Osborne is known to have been influenced by social media communications ahead of his attack. His phone and computers showed that he accessed the Twitter account of Stephen Yaxley-Lennon two days before the attack, who he only started following two weeks prior. The tweet from Robinson read ‘Where was the day of rage after the terrorist attacks. All I saw was lighting candles’. A direct Twitter message was also sent to Osborne by Jayda Fransen of Britain First ( Rawlinson, 2018 ). Other lone actor extreme right-wing terrorists, including Pavlo Lapshyn and Anders Breivik, are also known to have self-radicalized via the Internet ( Peddell et al. 2016 ).

Far right and popular right-wing activity on social media, unhindered for decades due to free-speech protections, has shaped the perception of many users regarding what language is acceptable online. Further enabled by the disinhibiting and deindividuating effects of Internet communications, and the ineffectiveness of the criminal justice system to keep up with the pace of technological developments ( Williams, 2006 ), social media abounds with online hate speech. Online controversies, such as Gamergate, the Bank of England Fry/Austen fiasco and the Mark Meechan scandal, among many others, demonstrate how easily users of social media take to antagonistic discourse ( Williams et al. , 2019 ). In recent times, these users have been given further licence by the divisive words of popular right-wing politicians wading into controversial debates, in the hopes of gaining support in elections and leadership contests. The offline consequences of this trend are yet to be fully understood, but it is worth reminding ourselves that those who routinely work with hate offenders agree that although not all people who are exposed to hate material go on to commit hate crimes on the streets, all hate crime criminals are likely to have been exposed to hate material at some stage ( Peddell et al. , 2016 ).

The study relates to conceptual work that examines the role of social media in political polarization ( Sunstein, 2017 ) and the disruption of ‘hierarchies of credibility’ ( Greer and McLaughlin, 2010 ). In the United States, online sources, including social media, now outpace traditional press outlets for news consumption ( Pew Research Centre, 2018 ). The pattern in the United Kingdom is broadly similar, with only TV news (79 per cent) leading over the Internet (64 per cent) for all adults, and the Internet, in particular social media taking first place for those aged 16–24 ( Ofcom, 2018b ). In the research on polarization, the general hypothesis tested is disinformation is amplified in partisan networks of like-minded social media users, where it goes largely unchallenged due to ranking algorithms filtering out any challenging posts. Sunstein (2017) argues that ‘echo chambers’ on social media reflecting increasingly extreme viewpoints are breeding grounds for ‘fake news’, far right and left conspiracy theories and hate speech. However, the evidence on the effect of social media on political polarization is mixed. Boxell et al. (2017) and Debois and Blank (2017) , both using offline survey data, found that social media had limited effect on polarization on respondents. Conversely, Brady et al. (2017) and Bail et al. (2018) , using online and offline data, found strong support for the hypothesis that social media create political echo chambers. Bail et al. found that republicans, and to a lesser extent democrats, were likely to become more entrenched in their original views when exposed to opposing views on Twitter, highlighting the resilience of echo chambers to destabilization. Brady et al. found that emotionally charged (e.g. hate) messages about moral issues (e.g. gay marriage) increased diffusion within echo chambers, but not between them, indicating this as a factor in increasing polarization between liberals and conservatives.

A recently exposed factor that is a likely candidate for increasing polarization around events is the growing use of fake accounts and bots to spread divisive messages. Preliminary evidence shows that these automated Twitter accounts were active in the UK-EU referendum campaign, and most influential on the leave side ( Howard and Kollanyi, 2016 ). Twitter accounts linked to the Russian Internet Research Agency (IRA) were also active in the Brexit debate following the vote. These accounts also spread fake news and promoted xenophobic messages in the aftermath of the 2017 UK terror attacks ( Crest, 2017 ). Accounts at the extreme-end of right-wing echo chambers were routinely targeted by the IRA to gain traction via retweets. Key political and far right figures have also been known to tap into these echo chambers to drum-up support for their campaigns. On Twitter, Donald Trump has referred to Mexican immigrants as ‘criminals and rapists’ and retweeted far right activists after Charlottesville, and Islamophobic tweets from the far right extremist group, Britain First. The leaders of Britain First, and the ex-leader of the English Defence League, all used social media to spread their divisive narrative before they were banned from most platforms between December 2017 and March 2019. These extremist agitators and others like them have used the rhetoric of invasion, threat and otherness in an attempt to increase polarization online, in the hope that it spills into the offline, in the form of votes, financial support and participation in rallies. Research by Hope Not Hate (2019) shows that at the time of the publication of their report, 5 of the 10 far-right social media activists with the biggest online reach in the world were British. The newest recruits to these ideologies (e.g. Generation Identity) are highly technically capable and believe social media to be essential to building a larger following.

Whatever the effect of social media on polarization, and how this may vary by individual-level factors, the role of events, bots and far right agitators, there remains limited experimental research that pertains to the key aim of this article: its impact on the behaviour of the public offline. Preliminary unpublished work suggests a link between online polarizing activity and offline hate crime ( Müller and Shwarz, 2018a , 2018b ). But what remains under-theorized is why social media has salience in this context that overrides the effect of other sources (TV, newspapers, radio) espousing arguably more mainstream viewpoints. Greer and Mclaughlin (2010) have written about the power of social media in the form of citizen journalism, demonstrating how the initially dominant police driven media narrative of ‘protestor violence’ in the reporting of the G20 demonstration was rapidly disrupted by technology-driven alternative narratives of ‘police violence’. They conclude “the citizen journalist provides a valuable additional source of real-time information that may challenge or confirm the institutional version of events” (2010: 1059). Increasingly, far right activists like Stephen Yaxley-Lennon are adopting citizen journalism as a tactic to polarize opinion. Notably, Lennon live-streamed himself on social media outside Leeds Crown Court hearing the Huddersfield grooming trials to hundreds of thousands of online viewers. His version of events was imbued with anti-Islam rhetoric, and the stunt almost derailed the trial. Such tactics take advantage of immediacy, manipulation, partisanship and a lack of accountability rarely found in mainstream media. Such affordances can provide a veil of authenticity and realism to stories, having the power to reframe their original casting by the ‘official’ establishment narrative, further enabled by dramatic delivery of ‘evidence’ of events as they occur. The ‘hacking’ of the information-communications marketplace enabled by social media disrupts the primacy of conventional media, allowing those who produce subversive “fake news” anti-establishment narratives to rise up the ‘hierarchy of credibility’. The impact of this phenomenon is likely considerable knowing over two-thirds of UK adults, and eight in ten 16- to 24-year-olds now use the Internet as their main source of news ( Ofcom, 2018b ).

The hypotheses test if online hate speech on Twitter, an indicator of right-wing polarization, can improve upon the estimations of offline hate crimes that use conventional predictors alone.

H1 : Conventional census regressors associated with hate crime in previous research will emerge as statistically significant.

‘Realistic’ threats are often associated with hate crimes (Stephan and Stephan, 2000 ; Roberts et al., 2013). These relate to resource threats, such as competition over jobs and welfare benefits. Espiritu (2004) shows how US census measures relating to economic context are statistically associated with hate crimes at the state level. In the United Kingdom, Ray et al. (2004) found that a sense of economic threat resulted in unacknowledged shame, which was experienced as rage directed toward the minority group perceived to be responsible for economic hardship. Demographic ecological factors, such as proportion of the population who are black or minority ethnic and age structure, have also been associated with hate crime ( Green, 1998 ; Nandi et al. , 2017 ; Williams and Tregidga, 2014 ; Ray et al. , 2004 ). In addition, educational attainment has been shown to relate to tolerance, even among those explicitly opposed to minority groups ( Bobo and Licari, 1989 ).

H2 : Online hate speech targeting race and religion will be positively associated with police recorded racially and religiously aggravated crimes in London.

Preliminary unpublished work focusing on the United States and Germany has showed that posts from right-wing politicians that target minority groups, deemed as evidence of extreme polarization, are statistically associated with variation in offline hate crimes recorded by the police. Müller and Shwarz (2018a) found an association between Trump’s tweets about Islam-related topics and anti-Muslim hate in US state counties. The same authors also found anti-refugee posts on the far-right Alternative für Deutschland’s Facebook page predicted offline-violent crime against immigrants in Germany ( Müller and Shwarz, 2018b ). This hypothesis tests for the first time if these associations are replicated in the United Kingdom’s largest metropolitan area.

H3 : Estimation models including the online hate speech regressor will increase the amount of offline hate crime variance explained in panel-models compared to models that include census variables alone.

Williams et al. (2017a) found that tweets mentioning terms related to the concept of ‘broken windows’ were statistically associated with police recorded crime (hate crime was not included) in London boroughs and improved upon the variance explained compared to census regressors alone. This hypothesis tests whether these results hold for the estimation of hate crimes.

The study adopted methods from Computational Criminology (see Williams et al. , 2017a for an overview). Data were linked from administrative, survey and social media sources to build our statistical models. Police recorded racially and religiously aggravated offences data were obtained from the Metropolitan Police Service for an eight-month period between August 2013 and August 2014. UK census variables from 2011 were derived from the Nomis web portal. London-based tweets were collected over the eight-month period using the Twitter streaming Application Programming Interface via the COSMOS software ( Burnap et al. , 2014 ). All sources were linked by month and Lower Layer Super Output Area (LSOA) in preparation for a longitudinal ecological analysis.

Dependent measures

Police recorded crime.

Police crime data were filtered to ensure that only race hate crimes related to anti-black/west/south Asian offences, and religious hate crimes related to anti-Islam/Muslim offences were included in the measures. In addition to total police recorded racially and religiously aggravated offences ( N = 6,572), data were broken down into three categories: racially and religiously aggravated violence against the person, criminal damage and harassment reflecting Part II of the Crime and Disorder Act 1998.

Independent measures

Social media regressors.

Twitter data were used to derive two measures. Count of Geo-coded Twitter posts —21.7 million posts were located within the 4720 London LSOAs over the study window as raw counts (Overall: mean 575; s.d. 1,566; min 0; max 75,788; Between: s.d. 1,451; min 0; max 53,345; Within: s.d. 589; min –23,108; max 28,178). Racial and Religious Online Hate Speech —the London geo-coded Twitter corpus was classified as ‘hateful’ or not (Overall: mean 8; s.d. 15.84; min 0; max 522; Between: s.d. 12.57; min 0; max 297; Within: s.d. 9.63; min –120; max 440). Working with computer scientists, a supervised machine learning classifier was built using the Weka tool to distinguish between ‘hateful’ Twitter posts with a focus on race (in this case anti-black/middle-eastern) and religion (in this case anti-Islam/Muslim), and more general non-‘hateful’ posts. A gold standard dataset of human-coded annotations was generated to train the machine classifier based on a sample of 2,000 tweets. In relation to each tweet, human coders were tasked with selecting from a ternary set of classes (‘yes’, ‘no’, and ‘undecided’) in response to the following question: ‘is this text offensive or antagonistic in terms of race, ethnicity or religion?’ Tweets that achieved 75 per cent agreement and above from four human coders were transposed into a machine learning training dataset (undecided tweets were dropped). Support Vector Machine with Bag of Words feature extraction emerged as most accurate machine learning model, with a precision of 0.89, a retrieval of 0.69 and an overall F-measure of 0.771, above the established threshold of 0.70 in the field of information retrieval ( van Rijsbergen, 1979 ). The final hate dataset consisted of 294,361 tweets, representing 1.4 per cent of total geo-coded tweets in the study window (consistent with previous research, see Williams and Burnap, 2016 ; Williams and Burnap, 2018 ). Our measure of online hate speech is not designed to correspond directly to online hate acts deemed as criminal in the UK law. The threshold for criminal hate speech is high, and legislation is complex (see CPS guidance and Williams et al. , 2019 ). Ours is a measure of online inter-group racial and/or religious tension, akin to offline community tensions that are routinely picked up by neighborhood policing teams. Not all manifestations of such tension are necessarily criminal, but they may be indicative of pending activity that may be criminal. Examples of hate speech tweets in our sample, include: ‘Told you immigration was a mistake. Send the #Muzzies home!’; ‘Integrate or fuck off. No Sharia law. #BurntheQuran’; and ‘Someone fucking knifed on my street! #niggersgohome’. 3

Census regressors

Four measures were derived from 2011 census data based on the literature that estimated hate crime using ecological factors (e.g. Green, 1998 ; Espiritu, 2004 ). These include proportion of population: (1) with no qualifications, (2) aged 16–24, (3) long-term unemployed, and (4) black and minority ethnic (BAME). 4

Methods of estimation

The estimation process began with a single-level model that collapsed the individual 8 months worth of police hate crime and Twitter data into one time period. Because of the skewed distribution of the data and the presence of over-dispersion, a negative binomial regression model was selected. These non-panel models provide a baseline against which to compare the second phase of modelling. To incorporate the temporal variability of police recorded crime and Twitter data, the second phase of modelling adopted a random- and fixed-effects regression framework. The first step was to test if this framework was an improvement upon the non-panel model that did not take into account time variability. The Breusch–Pagan Lagrange multiplier test revealed random-effects regression was favourable over single-level regression. Random effects modelling allows for the inclusion of time-variant (police and Twitter data) and time-invariant variables (census measures). Both types of variable were grouped into the 4720 LSOA areas that make up London. Using LSOA as the unit of analysis in the models allowed for an ‘ecological’ appraisal of the explanatory power of race and religious hate tweets for estimating police recorded racially and religiously aggravated offences ( Sampson, 2012 ). When the error term of an LSOA is correlated with the variables in the model, selection bias results from time-invariant unobservables, rendering random effects inconsistent. The alternative fixed-effects model that is based on within-borough variation removes such sources of bias by controlling for observed and unobserved ecological factors. Therefore, both random- and fixed-effects estimates are produced for all models. 5 A Poisson model was chosen over negative binomial, as the literature suggests the latter does not produce genuine fixed-effects (FE) estimations. 6 In addition, Poisson random-/fixed-effects (RE/FE) estimation with robust standard errors is recognized as the most reliable option in the presence of over-dispersion ( Wooldridge, 1999 ). There were no issues with multicollinearity in the final models.

Figures 4–7 show scatterplots with a fitted lined (95% confidence interval in grey) of the three types of racially and religiously aggravated offences (plus combined) by race and religious hate speech on Twitter over the whole eight-month period. Scatterplots indicated a positive relationship between the variables. Two LSOAs emerged as clear outliers (LSOA E01004736 and E01004763: see Figures 8–9 ) and required further inspection (not included in scatter plots). A jackknife resampling method was used to confirm if these LSOAs (and others) were influential points. This method fits a negative binomial model in 4,720 iterations while suppressing one observation at a time, allowing for the effect of each suppression on the model to be identified; in plain terms, it allows us to see how much each LSOA influences the estimations. Inspection of a scatterplot of dfbeta values (the amount that a particular parameter changes when an observation is suppressed) confirmed the above LSOAs as influential points, and in addition E01002444 (Hillingdon, in particular Heathrow Airport) and E01004733 (Westminster). The decision was made to build all models with and without outliers to identify any significant differences. The inclusion of all four outliers did change the magnitude of effects, standard errors and significance levels for some variables and model fit, so they were removed in the final models.

Hate tweets by R & R aggravated violence against the person

Hate tweets by R & R aggravated violence against the person

Hate tweets by R & R aggravated harassment

Hate tweets by R & R aggravated harassment

Hate tweets by R & R aggravated criminal damage

Hate tweets by R & R aggravated criminal damage

Hate tweets by R & R aggravated offences combined

Hate tweets by R & R aggravated offences combined

Outlier LSOA E01004736

Outlier LSOA E01004736

Outlier LSOA E01004763

Outlier LSOA E01004763

Table 1 presents results from the negative binomial models for each type of racially and religiously aggravated crime category. These models do not take into account variation over time, so estimates should be considered as representing statistical associations covering the whole eight-month period of data collection, and a baseline against which to compare the panel models presented later. The majority of the census regressors emerge as significantly predictive of all racially and religiously aggravated crimes, broadly confirming previous hate crime research examining similar factors and partly supporting Hypothesis 1. Partly supporting Green (1998) and Nandi (2017) the proportion of the population that is BAME emerged as positively associated with all race and religious hate crimes, with the greatest effect emerging for racially or religiously aggravated violence against the person. Partly confirming work by Bobo and Licari (1989) models shows a positive relationship between the proportion of the population with no qualifications and racially and religiously aggravated violence, criminal damage and total hate crime, but the association only emerged as significant for criminal damage. Proportion of the population aged 16–24 only emerged as significant for criminal damage and total hate crimes, and the relationship was negative, partly contradicting previous work ( Ray et al. , 2004 ; Williams and Tregidga, 2014 ). Like Espiritu (2004) and Ray et al. (2004) , the models show that rates of long-term unemployment were positively associated with all race and religious hate crimes. Although this variable had the greatest effect in the models, we found an inverted U-shape curvilinear relationship (indicated by the significant quadratic term). Figure 10 graphs the relationship, showing as the proportion of the long-term unemployed population increases victimization increases to a mid-turning point of 3.56 per cent where victimization begins to decrease.

Negative binomial models (full 8-month period, N = 4,270)

Notes: Because of the presence of heteroskedasticity robust standard errors are presented. * p < 0.05; ** p < 0.01; *** p < 0.001. All models significant at the 0.0000 level.

Plot of curvilinear relationship between long term unemployment and racially and religiously aggravated crime.

Plot of curvilinear relationship between long term unemployment and racially and religiously aggravated crime.

This finding at first seems counter-intuitive, but a closer inspection of the relationship between the proportion of the population that is long-term unemployed and the proportion of the population that is BAME reveals a possible explanation. LSOAs with very high long-term unemployment and BAME populations overlap. Where this overlap is significant, we find relatively low rates of hate crime. For example, LSOA E01001838 in Hackney, in particular the Frampton Park Estate area has 6.1 per cent long-term unemployment, a 68 per cent BAME population and only 2 hate crimes, and LSOA E01003732 in Redbridge has 5.6 per cent long-term unemployment, a 76 per cent BAME population, and only 2 hate crimes. These counts of hate crime either are below or are only slightly above the mean for London (mean = 1.39, maximum = 390). We know from robust longitudinal analysis by Nandi et al. (2017) that minority groups living in very high majority white areas are significantly more likely to report experiencing racial harassment. This risk decreases in high multicultural areas where there is low support for far right groups, such as London. Simple regression (not shown here) where the BAME population proportion was included as the only regressor does show an inverted U-shape relationship with all hate crimes, with the risk of victimization decreasing when the proportion far outweighs the white population. However, this curve was smoothed out when other regressors were included in the models. This analysis therefore suggests that LSOAs with high rates of long-term unemployment but lower rates of hate crime are likely to be those with high proportions of BAME residents, some of whom will be long-term unemployed themselves but unlikely to be perpetrating hate crimes against the ingroup.

Supporting Hypotheses 2, all negative binomial models show online hate speech targeting race and religion is positively associated with all offline racially and religiously aggravated offences, including total hate crimes in London over an eight-month period. The magnitude of the effect is relatively even across offence category. When considering the effect of the Twitter regressors against census regressors, it must be borne in mind the unit of change needed with each regressor to affect the outcome. For example, a percentage change in the BAME population proportion in an LSOA is quite different from a change in the count of hate tweets in the same area. The latter is far more likely to vary to a much greater extent and far more rapidly (see later in this section). The associations identified in these non-panel models indicate a strong link between hateful Twitter posts and offline racially and religiously aggravated crimes in London. Yet, it is not possible with these initial models to state direction of association: We cannot say if online hate speech precedes rather than follows offline hate crime.

Table 2 presents results from RE/FE Poisson models that incorporate variation over space and time . RE/FE models have been used to indicate causal pathways in previous criminological research; however, we suggest such claims in this article would stretch the data beyond their limits. As we adopt an ecological framework, using LSOAs as our unit of analysis, and not individuals, we cannot state with confidence that area-level factors cause the outcome. There are likely sub-LSOA factors that account for causal pathways, but we were unable to observe these in this study design. Nevertheless, the results of the RE/FE models represent a significant improvement over the negative binomial estimations presented earlier and are suitable for subjecting these earlier findings to a more robust test. Indeed, FE models are the most robust test given they are based solely on within-LSOA variation, allowing for the elimination of potential sources of bias by controlling for observed and unobserved ecological characteristics ( Allison, 2009 ). In contrast, RE models only take into account the factors included as regressors. These models therefore allow us to determine if online hate speech precedes rather than follows offline hate crime.

Random and fixed-effects Poisson regression models

Notes: Table shows results of separate random and fixed effects models. To determine if RE or FE is preferred the Hausman test can be used. However, this has been shown to be inefficient, and we prefer not to rely on it for interpreting our models (see Troeger, 2008 ). Therefore, both RE and FE results should be considered together. Because of the presence of heteroskedasticity robust standard errors are presented. Adjusted R 2 for random effects models only. * p < 0.05; ** p < 0.01; *** p < 0.001. All models significant at the 0.0000 level.

The RE/FE modelling was conducted in three stages (Models A to C) to address Hypothesis 3—to assess the magnitude of the change in the variance explained in the outcomes when online hate speech is added as a regressor. Model A includes only the census regressors for the RE estimations, and for all hate crime categories, broadly similar patterns of association emerge compared to the non-panel models. The variance explained by the set of census regressors ranges between 2 per cent and 6 per cent. Such low adjusted R-square values are not unusual for time-invariant regressors in panel models ( Allison, 2009 ).

Models B and C were estimated with RE and FE and introduce the Twitter variables of online hate speech and total count of geo-coded tweets. Model B introduces online hate speech alone, and both RE and FE results show positive significant associations with all hate crime categories. The largest effect in the RE models emerges for harassment (IRR 1.004). For every unit increase in online hate speech a corresponding 0.004 per cent unit increase is observed in the dependent. Put in other terms, an increase of 100 hate tweets would correspond to a 0.4 per cent increase, and an increase of 1,000 tweets would correspond to a 4 per cent increase in racially or religiously aggravated harassment in a given month within a given LSOA. Given we know hate speech online increases dramatically in the aftermath of trigger events (Williams and Burnap, 2015), the first example of an increase of 100 hate tweets in an LSOA is not fanciful. The magnitude of the effect with harassment, compared to the other hate offences, is also expected, given hate-related public order offences, that include causing public fear, alarm and distress, also increased most dramatically in the aftermath the ‘trigger’ events alluded to above (accounting for 56 per cent of all hate crimes recorded by police in 2017/18 ( Home Office, 2018) ). The adjusted R-square statistic for Model B shows large increases in the variance explained in the dependents by the inclusion of online hate speech as a regressor, ranging between 13 per cent and 30 per cent. Interpretation of these large increases should be tempered given time-variant regressors can exert a significant effect in panel models ( Allison, 2009 ). Nonetheless, the significant associations in both RE and FE models and the improvement in the variance explained provide strong support for Hypotheses 2 and 3.

Model C RE and FE estimations control for total counts of geo-coded Tweets, therefore eradicating any variance explained by the hate speech regressor acting as a proxy for population density ( Malleson and Andresen, 2015 ). In all models, the direction of relationship and significance between online hate speech and hate crimes does not change, but the magnitude of the effect does decrease, indicating the regressor was likely also acting, albeit to a small extent, as proxy for population density. The FE models also include an interaction variable between the time-invariant regressor proportion of the population that is BAME and the time-variant regressor online hate speech. The interaction term was significant for all hate crime categories with the strongest effect emerging for racially and religiously aggravated violence against the person. Figure 11 presents a predicted probability plot combining both variables for the outcome of violent hate crime. In an LSOA with a 70 per cent BAME population with 300 hate tweets posted a month, the incidence rate of racially and religiously aggravated violence is predicted to be between 1.75 and 2. However, it must be borne in mind when interpreting these predictions, the skewed distribution of the sample. Just over 70 per cent of LSOAs have a BAME population of 50 per cent or less and 150 or less hate tweets per month, therefore the probability for offences in these areas is between 1 and 1.25 (lower-left dark blue region of the plot). This plot provides predictions based on the model estimates, meaning if in the future populations and hate tweets were to increase toward the upper end of the spectrums, these are the probabilities of observing the racially and religiously aggravated violence in London.

Predicted probability of R & R agg. violence by BAME population proportion and hate tweet count

Predicted probability of R & R agg. violence by BAME population proportion and hate tweet count

Our results indicate a consistent positive association between Twitter hate speech targeting race and religion and offline racially and religiously aggravated offences in London. Previous published work indicated an association around events that acted as ‘triggers’ for on and offline hate acts. This study confirms this association is consistent in the presence and absence of events. The models allowed us to provide predictions of the incidence rate of offline offences by proportion of the population that is BAME and the count of online hate tweets. The incidence rate for near three-quarters of LSOAs within London when taking into account these and other factors in the models remains below 1.25. Were the number of hate tweets sent per month to increase dramatically in an area with a high BAME population, our predictions suggest much higher incidence rates. This is noteworthy, given what we know about the impact of ‘trigger’ events and hate speech, and indicates that the role of social media in the process of hate victimization is non-trivial.

Although we were not able to directly test the role of online polarization and far right influence on the prevalence of offline hate crimes, we are confident that our focus on online hate speech acted as a ‘signature’ measure of these two phenomena. Through the various mechanisms outlined in the theoretical work presented in this article, it is plausible to conclude that hate speech posted on social media, an indicator of extreme polarization, influences the frequency of offline hate crimes. However, it is unlikely that online hate speech is directly causal of offline hate crime in isolation. It is more likely the case that social media is only part of the formula, and that local level factors, such as the demographic make-up of neighbourhoods (e.g. black and minority ethnic population proportion, unemployment) and other ecological level factors play key roles, as they always have in estimating hate crime ( Green, 1998 ; Espiritu, 2004 ; Ray et al. , 2004 ). What this study contributes is a data and theory-driven understanding of the relative importance of online hate speech in this formula. If we are to explain hate crime as a process and not a discrete act, with victimization ranging from hate speech through to violent victimization, social media must form part of that understanding ( Bowling, 1993 ; Williams and Tregidga, 2014 ).

Our results provide an opportunity to renew Bowling’s (1993) call to see racism as a continuity of violence, threat and intimidation. We concur that hate crimes must be conceptualized as a process set in geographical, social, historical and political context. We would add that ‘technological’ context is now a key part of this conceptualization. The enduring quality of hate victimization, characterized by repeated or continuous insult, threat, or violence now extends into the online arena and can be linked to its offline manifestation. We argue that hate speech on social media extends ‘climates of unsafety’ experienced by minority groups that transcend individual instances of victimization ( Stanko, 1990 ). Online hate for many minorities is part and parcel of everyday life—as Pearson et al. (1989 : 135) state ‘A black person need never have been the actual victim of a racist attack, but will remain acutely aware that she or he belongs to a group that is threatened in this manner’. This is no less true in the digital age. Social media, through various mechanisms such as unfettered use by the far right, polarization, events, and psychological processes such as deindividuation, has been widely infected with a casual low-level intolerance of the racial Other .

Our study informs the ongoing debate on ‘predictive policing’ using big data and algorithms to find patterns at scale and speed, hitherto unrealizable in law enforcement ( Kaufmann et al. , 2019 ). Much of the criminological literature is critical. The process of pattern identification further embeds existing power dynamics and biases, sharpens the focus on the symptoms and not the causes of criminality, and supports pre-emptive governance by new technological sovereigns ( Chan and Bennett Moses, 2017 ). These valid concerns pertain mainly to predictive policing efforts that apply statistical models to data on crime patterns, offender histories, administrative records and demographic area profiles. These models and data formats tend to produce outcomes that reflect existing patterns and biases because of their historical nature. Our work mitigates some of the existing pitfalls in prediction efforts in three ways: (1) The data used in estimating patterns are not produced by the police, meaning they are immune from inherent biases normally present in the official data generation process; (2) social media data are collected in real-time, reducing the error introduced by ‘old’ data that are no longer reflective of the context; and (3) viewing minority groups as likely victims and not offenders, while not addressing the existing purported bias in ongoing predictive policing efforts, demonstrates how new forms of data and technology can be tailored to achieve alternative outcomes. However, the models reported in this article are not without their flaws, and ahead of their inclusion in real-life applications, we would warn that predictions alone do not necessarily lead to good policing on the streets. As in all statistics, there are degrees of error, and models are only a crude approximation of what might be unfolding on the ground. In particular, algorithmic classification of hate speech is not perfect, and precision, accuracy and recall decays as language shifts over time and space. Therefore, any practical implementation would require a resource-intensive process that ensured algorithms were updated and tested frequently to avoid unacceptable levels of false positives and negatives.

Finally, we consider the methodological implications of this study are as significant as those outlined by Bowling (1993) . Examining the contemporary hate victimization dynamic requires methods that are able to capture both time and space variations in both online and offline data. Increasing sources of data on hate is also important due to continued low rates of reporting. We demonstrated how administrative (police records), survey (census) and new forms of data (Twitter) can be linked to study hate in the digital age. Surveys, interviews and ethnographies should be complemented by these new technological methods of enquiry to enable a more complete examination of the social processes which give rise to contemporary hate crimes. In the digital age, computational criminology, drawing on dynamic data science methods, can be used to study the patterning of online hate speech victimization and associated offline victimization. However, before criminologists and practitioners incorporate social media into their ‘data diets’, awareness of potential forms of bias in these new forms of data is essential. Williams et al . (2017a) identified several sources of bias, including variations in the use of social media (e.g. Twitter being much more popular with younger people). This is particularly pertinent given the recent abandonment of Twitter by many far right users following a clamp-down on hate speech in Europe. A reduction in this type of user may see a corresponding decrease in hate tweets, as they flock to more underground platforms, such as 8chan, 4chan, Gab and Voat, that are currently more difficult to incorporate into research and practical applications. The data used in this study were collected at a time before the social media giants introduced strict hate speech policies. Nonetheless, we would expect hate speech to be displaced, and in time data science solutions will allow us to follow the hate wherever it goes.

The government publication of ‘The Response to Racial Attacks and Harassment’ in 1989 saw a sea-change in the way criminal justice agencies and eventually the public viewed hate crime the United Kingdom ( Home Office, 1989 ). In 2019, the government published its Online Harms White Paper that tries to achieve the same with online hate ( Cabinet Office, 2019 ). Over the past decade, online hate victims have failed to convince others that they are undeserved targets of harm that is sufficiently serious to warrant collective concern, due to insufficient empirical credibility and their subsequent unheard calls for recognition. This research shows that online hate victimization is part of a wider process of harm that can begin on social media and then migrate to the physical world. Qualitative work shows direct individual level links between online and offline hate victimization ( Awan and Zempi, 2017 ). Our study extends this to the ecological level at the scale of the UK’s largest metropolitan area. Despite this significant advancement, we were unable to examine sub-LSOA factors, meaning the individual level mechanisms responsible for the link between online and offline hate incidents remain to be established by more forensic and possibly qualitative work. The combination of the data science-driven results of this study and future qualitative work has the potential to address the reduced capacity of the police to gain intelligence on terrestrial community tensions that lead to hate crimes. Such a technological solution may even assist in the redressing of the bias reportedly present in ‘predictive policing’ efforts, by refocussing the algorithmic lens away from those historically targeted by police, onto those that perpetrate harms against minorities.

This work was supported by the Economic and Social Research Council grant: ‘Centre for Cyberhate Research and Policy: Real-Time Scalable Methods & Infrastructure for Modelling the Spread of Cyberhate on Social Media’ (grant number: ES/P010695/1) and the US Department of Justice National Institute for Justice grant: ‘Understanding Online Hate Speech as a Motivator for Hate Crime’ (grant number: 2016-MU-MU-0009)

Allison , D. P . ( 2009 ), Fixed Effects Regression Models . Sage .

Google Scholar

Google Preview

Awan , I. and Zempi , I . ( 2017 ), ‘I Will Blow Your Face Off’—Virtual and Physical World Anti-Muslim Hate Crime’, British Journal of Criminology , 57 : 362 – 80

Burnap , P. , Rana , O. , Williams , M. , Housley , W. , Edwards , A. , Morgan , J. , Sloan , L. and Conejero , J . ( 2014 ), ‘COSMOS: Towards an Integrated and Scalable Service for Analyzing Social Media on Demand’, IJPSDS , 30 : 80 – 100 .

Burnap , P. and Williams , M. L . ( 2015 ), ‘Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making’ , Policy & Internet. 7 : 223 – 42 .

———. ( 2016 ), ‘Us and Them: Identifying Cyber Hate on Twitter across Multiple Protected Characteristics’ . EPJ Data Science, 5 : 1 – 15

Bail , C. A. , Argyle , L. P. , Brown , T. W. , Bumpus , J. P. , Chen , H. , Hunzaker , M. B. F. , Lee , J. , Mann , M. , Merhout , F. and Volfovsky , A . ( 2018 ), ‘Exposure to Opposing Views on Social Media Can Increase Political Polarization’ PNAS , 115 : 9216 – 21 .

Bobo , L. and Licari , F. C . ( 1989 ), ‘Education and Political Tolerance: Testing The Effects of Cognitive Sophistication and Target Group Affect’ , Public Opinion Quarterly 53 : 285 – 308 .

Bowling , B . ( 1993 ), ‘Racial Harassment and The Process of Victimisation: Conceptual and Methodological Implications for The Local Crime Survey’ , British Journal of Criminology , 33 : 231 – 50 .

Boxell , L. , Gentzkow , M. , and Shapiro , J. M . ( 2017 ), ‘Greater Internet Use Is Not Associated With Faster Growth In Political Polarization Among Us Demographic Groups’ , PNAS , 114 : 10612 – 0617 .

Brady , W. J. , Wills , J. A. , Jost , J. T. , Tucker , J. A. and Van Bavel , J. J . ( 2017 ), ‘Emotion Shapes The Diffusion of Moralized Content in Social Networks’ , PNAS , 114 : 7313 – 18 .

Cabinet Office. ( 2019 ) Internet Safety White Paper . Cabinet Office

Chan , J. and Bennett Moses , L . ( 2017 ), ‘Making Sense of Big Data for Security’ , British Journal of Criminology , 57 : 299 – 319 .

Chokshi , N . ( 2019 ), PewDiePie in Spotlight After New Zealand Shooting . New York Times .

CPS. ( 2018 ), Hate Crime Report 2017–18 . Crown Prosecutions Service .

Crest ( 2017 ), Russian Influence and Interference Measures Following the 2017 UK Terrorist Attacks . Centre for Research and Evidence on Security Threats .

Debois , E. and Blank , G . ( 2017 ), ‘The Echo Chamber is Over-Stated: The Moderating Effect of Political Interest and Diverse Media’ , Information, Communication & Society , 21 : 729 – 45 .

Demos. ( 2017 ), Anti-Islamic Content on Twitter . Demos

Espiritu , A . ( 2004 ), ‘Racial Diversity and Hate Crime Incidents’ , The Social Science Journal , 41 : 197 – 208 .

Green , D. P. , Strolovitch , D. Z. and Wong , J. S . ( 1998 ), ‘Defended Neighbourhoods, Integration and Racially Motivated Crime’ , American Journal of Sociology , 104 : 372 – 403 .

Greer , C. and McLaughlin , E . ( 2010 ), ‘We Predict a Riot? Public Order Policing, New Media Environments and the Rise of the Citizen Journalist’ , British Journal of Criminology , 50 : 1041 – 059 .

Greig-Midlane , J . ( 2014 ), Changing the Beat? The Impact of Austerity on the Neighbourhood Policing Workforce . Cardiff University .

Hanes , E. and Machin , S . ( 2014 ), ‘Hate Crime in the Wake of Terror Attacks: Evidence from 7/7 and 9/11’ , Journal of Contemporary Criminal Justice , 30 : 247 – 67 .

Hawdon , J. , Oksanen , A. and Räsänen , P . ( 2017 ), ‘Exposure To Online Hate In Four Nations: A Cross-National Consideration’ , Deviant Behavior , 38 : 254 – 66 .

Hern , A . ( 2018 ), Facebook Protects Far-Right Activists Even After Rule Breaches . The Guardian .

HMICFRS. ( 2018 ), Understanding the Difference: The Initial Police Response to Hate Crime . Her Majesty’s Inspectorate of Constabulary and Fire and Rescue Service .

Home Office. ( 1989 ), The Response to Racial Attacks and Harassment: Guidance for the Statutory Agencies, Report of the Inter-Departmental Racial Attacks Group . Home Office .

———. ( 2018 ), Hate Crime, England and Wales 2017/18 . Home Office .

Hope Not Hate. ( 2019 ), State of Hate 2019 . Hope Not Hate .

Howard , P. N. and Kollanyi , B . ( 2016 ), Bots, #StringerIn, and #Brexit: Computational Propeganda during the UK-EU Referendum . Unpublished Research Note. Oxford University Press.

Kaufmann , M. , Egbert , S. and Leese , M . ( 2019 ), ‘Predictive Policing and the Politics of Patterns’ , British Journal of Criminology , 59 : 674 – 92 .

Lehman , J . ( 2014 ), A Brief Explanation of the Overton Window . Mackinac Center for Public Policy .

Malleson , N. and Andresen , M. A . ( 2015 ), ‘Spatio-temporal Crime Hotspots and The Ambient Population’ , Crime Science , 4 : 1 – 8 .

Müller , K. and Schwarz , C. ( 2018a ), Making America Hate Again? Twitter and Hate Crime Under Trump . Unpublished working paper. University of Warwick.

———. ( 2018b ), Fanning the Flames of Hate: Social Media and Hate Crime . Unpublished working paper. University of Warwick.

Nandi , A. , Luthra , R. , Saggar , S. and Benzeval , M . ( 2017 ), The Prevalence and Persistence of Ethnic and Racial Harassment and Its Impact on Health: A Longitudinal Analysis . University of Essex .

Ofcom. ( 2018a ), Children and Parents: Media Use and Attitudes . Ofcom

———. ( 2018b ), News Consumption in the UK: 2018 . Ofcom .

———. ( 2018c ), Adults’ Media Use and Attitudes Report . Ofcom

ONS. ( 2017 ), CSEW Estimates of Number of Race and Religion Related Hate Crime in England and Wales, 12 Months Averages, Year Ending March 2014 to Year Ending March 2017 . Office for National Statistics .

Pearson , G. , Sampson , A. , Blagg , H. , Stubbs , P. and Smith , D. J . ( 1989 ), ‘Policing Racism’, in R. Morgan and D. J. Smith , eds., Coming to Terms with Policing: Perspectives on Policy . Routledge .

Peddell , D. , Eyre , M. , McManus , M. and Bonworth , J . ( 2016 ), ‘Influences and Vulnerabilities in Radicalised Lone Actor Terrorists: UK Practitioner Perspectives’ , International Journal of Police Science and Management , 18 : 63 – 76 .

Perry , B. and Olsson , P . ( 2009 ), ‘Cyberhate: The Globalisation of Hate’ , Information & Communications Technology Law , 18 : 185 – 99 .

Pew Research Centre. ( 2018 ), Americans Still Prefer Watching to Reading the News . Pew Research Centre .

Rawlinson , K . ( 2018 ), Finsbury Park-accused Trawled for Far-right Groups Online, Court Told . The Guardian .

Ray , L. , Smith , D. and Wastell , L . ( 2004 ), ‘Shame, Rage and Racist Violence’ , British Journal of Criminology , 44 : 350 – 68 .

Roberts, C., Innes, M., Williams, M. L., Tregidga, J. and Gadd, D. (2013), Understanding Who Commits Hate Crimes and Why They Do It [Project Report]. Welsh Government.

van Rijsbergen , C. J . ( 1979 ), Information Retrieval (2nd ed.), Butterworth .

Sampson , R. J . ( 2012 ), Great American City: Chicago and the Enduring Neighborhood Effect . University of Chicago Press .

Stanko . ( 1990 ), Everyday Violence . Pandora .

Stephan , W. G. and Stephan , C. W . ( 2000 ), An Integrated Threat Theory of Prejudice . Lawrence Erlbaum Associates .

Sunstein , C. R . ( 2017 ), #Republic: Divided Democracy in the Age of Social Media . Princeton University Press .

Troeger, V. E. (2008), ‘Problematic Choices: Testing for Correlated Unit Specific Effects in Panel Data’, Presented at 25th Annual Summer Conference of the Society for Political Methodology, 9–12 July 2008.

Williams, M. L. (2006), Virtually Criminal: Crime, Deviance and Regulation Online . Routledge.

Williams , M. and Burnap , P . ( 2016 ), ‘Cyberhate on Social Media in the Aftermath of Woolwich: A Case Study in Computational Criminology and Big Data’ , British Journal of Criminology , 56 : 211 – 38 .

———. ( 2018 ), Antisemitic Content on Twitter . Community Security Trust .

Williams , M. and Tregidga , J . ( 2014 ), ‘Hate Crime Victimisation in Wales: Psychological and Physical Impacts Across Seven Hate Crime Victim-types’ , British Journal of Criminology , 54 : 946 – 67 .

Williams , M. L. , Burnap , P. and Sloan , L. ( 2017a ), ‘Crime Sensing With Big Data: The Affordances and Limitations of Using Open-source Communications to Estimate Crime Patterns’ , The British Journal of Criminology , 57 : 320 – 40.

———. ( 2017b ), ‘Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation’ , Sociology , 51 : 1149 – 68 .

Williams, M. L., Eccles-Williams, H. and Piasecka, I. (2019), Hatred Behind the Screens: A Report on the Rise of Online Hate Speech . Mishcon de Reya.

Williams, M. L., Edwards, A. E., Housley, W., Burnap, P., Rana, O. F., Avis, N. J., Morgan, J. and Sloan, L. (2013), ‘Policing Cyber-Neighbourhoods: Tension Monitoring and Social Media Networks’, Policing and Society , 23: 461–81.

Wooldridge , J. M . ( 1999 ), ‘Distribution-Free Estimation of Some Nonlinear Panel Data Models’ , Journal of Econometrics , 90 : 77 – 97 .

For current CPS guidance on what constitutes an online hate offence see: https://www.cps.gov.uk/legal-guidance/social-media-guidelines-prosecuting-cases-involving-communications-sent-social-media .

Not all hate speech identified reaches the threshold for a criminal offence in England and Wales.

These are not actual tweets from the dataset but are instead constructed illustrations that maintain the original meaning of authentic posts while preserving the anonymity of tweeters (see Williams et al. 2017b for a fuller discussion of ethics of social media research).

Other census measures were excluded due to multicollinearity, including religion.

To determine if RE or FE is preferred, the Hausman test can be used. However, this has been shown to be inefficient, and we prefer not to rely on it for interpreting our models (see Troeger, 2008 ). Therefore, both RE and FE results should be considered together.

See https://www.statalist.org/forums/forum/general-stata-discussion/general/1323497-choosing-between-xtnbreg-fe-bootstrap-and-xtpoisson-fe-cluster-robust .

Email alerts

Citing articles via.

  • Recommend to your Library


  • Online ISSN 1464-3529
  • Print ISSN 0007-0955
  • Copyright © 2024 Centre for Crime and Justice Studies (formerly ISTD)
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

clock This article was published more than  5 years ago

How online hate turns into real-life violence

Social media sites have become hubs for the proliferation of white-supremacist propaganda..

essay on online hate speech

About US is a new initiative by The Washington Post to cover issues of identity in the United States. Sign up for the newsletter .

White-supremacist groups use social media as a tool to distribute their message, where they can incubate their hate online and allow it to spread. But when their rhetoric reaches certain people, the online messages can turn into real-life violence.

Several incidents in recent years have shown that when online hate goes offline, it can be deadly. White supremacist Wade Michael Page posted in online forums tied to hate before he went on to murder six people at a Sikh temple in Wisconsin in 2012. Prosecutors said Dylann Roof “self-radicalized” online before he murdered nine people at a black church in South Carolina in 2015. Robert Bowers, accused of murdering 11 elderly worshipers at a Pennsylvania synagogue in October, had been active on Gab , a Twitter-like site used by white supremacists.

And just a few weeks ago , a 30-year-old D.C. man who described himself as a white nationalist was arrested on a gun charge after concerned relatives alerted police to his violent outbursts, including saying that the victims at the synagogue “deserved it.” Police say the man was online friends with Bowers.

“I think that the white-supremacist movement has used technology in a way that has been unbelievably effective at radicalizing people,” said Adam Neufeld, vice president of innovation and strategy for the Anti-Defamation League.

“We should not kid ourselves that online hate will stay online,” Neufeld added. “Even if a small percentage of those folks active online go on to commit a hate crime, it’s something well beyond what we’ve seen for America.”

Hate speech is showing up in many schools. More censorship isn’t the answer.

In 2017, white supremacists committed the majority of domestic extremist-related killings in the United States, according to a report from the Anti-Defamation League . They were responsible for 18 of the 34 murders documented by domestic extremists that year.

The influence of the Internet in fostering white-supremacist ideas can’t be underestimated, said Shannon Martinez, who helps people leave extremist groups as program director of the Free Radicals Project . The digital world gives white supremacists a safe space to explore extreme ideologies and intensify their hate without consequence, she said. Their rage can grow under the radar until the moment when it explodes in the real world.

“There’s a lot of romanticization of violence among the far-right online, and there aren’t consequences to that,” said Martinez, who was a white-power skinhead for about five years. “In the physical world, if you’re standing in front of someone and you say something abhorrent, there’s a chance they’ll punch you. Online, you don’t have that, and you escalate into further physical violence without a threat to yourself.”

How hate spreads

Internet culture often categorizes hate speech as “trolling,” but the severity and viciousness of these comments has evolved into something much more sinister in recent years, said Whitney Phillips, an assistant professor of communications at Syracuse University. Frequently, the targets of these comments are people of color, women and religious minorities, who have spoken out about online harassment and hateful attacks for as long as the social media platforms have existed, calling for tech companies to take action to curb them.

“The more you hide behind ‘trolling,’ the more you can launder white supremacy into the mainstream,” said Phillips, who released a report this year, “ The Oxygen of Amplification ,” that analyzed how hate groups have spread their messages online.

Phillips described how white-supremacist groups first infiltrated niche online communities such as 4chan, where trolling is a tradition. But their posts on 4chan took a more vicious tone after Gamergate, the Internet controversy that began in 2013 with a debate over increasing diversity in video games and that snowballed into a full-on culture war. Leaders of the Daily Stormer, a white-supremacist site, became a regular presence on 4chan as the rhetoric got increasingly nasty, Phillips said, and stoked already-present hateful sentiments on the site.

Phillips said it’s unclear how many people were radicalized through 4chan, but the hateful content spread like a virus to more mainstream sites such as Facebook, Twitter and Instagram through shared memes and retweets, where they reach much larger audiences.

White parents teach their children to be colorblind. Here’s why that’s bad for everyone.

Unlike hate movements of the past, extremist groups are able to quickly normalize their messages by delivering a never-ending stream of hateful propaganda to the masses.

“One of the big things that changes online is that it allows people to see others use hateful words, slurs and ideas, and those things become normal,” Neufeld said. “Norms are powerful because they influence people’s behaviors. If you see a stream of slurs, that makes you feel like things are more acceptable.”

While Facebook and Twitter have official policies prohibiting hate speech, some users say that their complaints often go unheard.

“You have policies that seem straightforward, but when you flag [hate speech], it doesn't violate the platform’s policies,” said Adriana Matamoros Fernández, a lecturer at the Queensland University of Technology in Australia who studies the spread of racism on social media platforms.

Facebook considers hate speech to be a “direct attack” on users based on “protected characteristics,” including race, ethnicity, national origin, sexual orientation and gender identity, Facebook representative Ruchika Budhraja said, adding that the company is developing technology that better filters comments reported as hate speech.

Twitter’s official policy also states that it is committed to combating online abuse.

In an email, Twitter spokesman Raki Wane said, “We have a global team that works around the clock to review reports and help enforce our rules consistently.”

Both platforms have taken action to enforce these rules. Writer Milo Yiannopoulos was banned on Twitter in 2016 after he led a racist campaign against “Ghostbusters” actor Leslie Jones. In August, Facebook banned Alex Jones from its platform for violating its hate speech policy . The following month, Twitter also banned him .

But bad actors have slipped through the cracks. Before Cesar Sayoc allegedly sent 13 homemade explosives to prominent Democrats and media figures in October, political analyst Rochelle Ritchie says he targeted her on Twitter. She said she reported Sayoc to the social media site after he sent her a threatening message , telling her to “hug your loved ones real close every time you leave home.” At the time, Twitter told her that the comment did not violate its policy , but after Sayoc was arrested, the social media site said that it was “deeply sorry” and that the original tweet “clearly violated our rules.”

The rules themselves, even when followed, can fall short. Users who are banned for policy violations can easily open a new account, Matamoros Fernández said. And while technologies exist to moderate text-based hate speech, monitoring image-based posts, such as those on Instagram, is trickier. On Facebook, where some groups are private, it’s even more difficult for those who track hate groups to see what is happening.

Tech companies “have been too slow to realize how influential their platforms are in radicalizing people, and they are playing a lot of catch-up,” Neufeld said. “Even if they were willing to do everything possible, it’s an uphill battle. But it’s an uphill battle that we have to win.”

Learning from the past

While hate speech today proliferates online, the methods used by these hate groups is nothing new. The path to radicalization is similar to that used by the Nazis in the early 20th century, said Steven Luckert, a curator at the United States Holocaust Memorial Museum who focuses on Nazi propaganda.

“Skillful propagandists know how to play on people’s emotions,” Luckert said. “You play upon people’s fears that their way of life is going to disappear, and you use this propaganda to disseminate fear. And often, that can be very successful.”

Most white Americans will never be affected by affirmative action. So why do they hate it so much?

The Nazis did not start their rise to power with the blatantly violent and murderous rhetoric now associated with Nazi Germany. It began with frequent, quieter digs at Jewish people that played on fears of “the other” and ethnic stereotypes. They used radio — what Luckert calls “the Internet of its time” — to spread their dehumanizing messages.

“They created this climate of indifference to the plight of the Jews, and that was a factor of the Holocaust,” Luckert said. “Someone didn’t have to hate Jews, but if they were indifferent, that’s all that was often needed.”

The antidote, Luckert says, is for people to not become immune to hate speech.

“It’s important to not be indifferent or a passive observer,” Luckert said. “People need to stand up against hate and not sit back and do nothing.”

Martinez, of Free Radicals, said that to combat the spread of hate, white Americans need to be more proactive in learning about the history of such ideologies.

She said she recently took her 11-year-old son to see the new lynching memorial in Alabama that memorializes the 4,000 victims.

She said her son was overwhelmed by what he saw. Security guards who saw the boy attempting to process the display suggested that he ask his mother to get ice cream, a treat to ease the emotional weight of the museum. Martinez refused.

“He’s a white man in America. I’m not going to let him ‘ice cream’ his way out of it,” Martinez said. “We have to shift this idea that we are somehow protecting our children by not talking about racism and violence. We can’t ice cream it away. We have to be forthcoming about our legacy of violence.”

essay on online hate speech

essay on online hate speech

  • Publication
  • Arts & Humanities
  • Behavioural Sciences
  • Business & Economics
  • Earth & Environment
  • Education & Training
  • Health & Medicine
  • Engineering & Technology
  • Physical Sciences
  • Thought Leaders
  • Community Content
  • Outreach Leaders
  • Our Company
  • Our Clients
  • Testimonials
  • Our Services
  • Researcher App
  • Podcasts & Video Abstracts

Subscribe To Our Free Publication

By selecting any of the topic options below you are consenting to receive email communications from us about these topics.

  • Behavioral Science
  • Engineering & Technology
  • All The Above

We are able share your email address with third parties (such as Google, Facebook and Twitter) in order to send you promoted content which is tailored to your interests as outlined above. If you are happy for us to contact you in this way, please tick below.

  • I consent to receiving promoted content.
  • I would like to learn more about Research Outreach's services.

We use MailChimp as our marketing automation platform. By clicking below to submit this form, you acknowledge that the information you provide will be transferred to MailChimp for processing in accordance with their Privacy Policy and Terms .

Hate speech regulation on social media: An intractable contemporary challenge

Catherine O’Regan and Stefan Theil of the Bonavero Institute of Human Rights in the Faculty of Law at the University of Oxford investigate initiatives to regulate hate speech online. They highlight the difficulties of finding a widely agreed definition of hate speech and assess the legislative initiatives in four major jurisdictions to inform those engaged in the policy debate concerning the regulation of online speech around the world.

The Internet has allowed people across the world to connect instantaneously and has revolutionised the way we communicate and share information with one another. More than 4 billion people were Internet users in 2018, more than half of the global population.

In many ways, the Internet has had a positive influence on society . For example, it helps us to communicate easily and to share knowledge on all kinds of important topics efficiently: from the treatment of disease to disaster relief. But the Internet has also broadened the potential for harm. Being able to communicate with a mass audience has meant that the way we engage with politics, public affairs and each other has also changed. Hateful messages and incitements to violence are distributed and amplified on social media in ways that were not previously possible.

essay on online hate speech

Through social media platforms (such as Facebook, Twitter, YouTube, Instagram and Snapchat), 3.19 billion users converse and interact with each other by generating and sharing content. The business model of most social media companies is built on drawing attention, and given that offensive speech often attracts attention, it can become more audible on social media than it might on traditional mass media. Given the growing problem of offensive and harmful speech online, many countries are asking themselves the challenging question whether they should regulate speech online and if so, how they should legislate to curb these excesses.

Hate speech vs freedom of speech The regulation of harmful speech in online spaces requires drawing a line between legitimate freedom of speech and hate speech. Freedom of speech is protected in the constitutions of most countries around the world, and in the major international human rights treaties. Of course, we know that despite this widespread protection, many countries do not provide effective protection for freedom of speech. One of the dangers of regulating hate speech online is that it will become a pretext for repressive regimes to further limit the rights of their citizens.

The age of digital media has allowed any online speech or content to be shared by one tap of a screen without a second thought for the consequences.

In countries committed to freedom of speech, it is necessary to develop a shared understanding of why freedom of speech is important. O’Regan and Theil suggest that there are three main reasons why we value freedom of speech: because we think being able to speak our minds is part of what makes us free and autonomous human beings, for democratic reasons, because we need to be able to talk about politics and policy freely to enable us to decide as equals how to vote and to hold those in power to account and for truth-related reasons, to enable us to refute false claims.

essay on online hate speech

Just as we need to understand why we value freedom of speech, we also need to understand why we should prohibit hate speech. There are two main reasons for outlawing hate speech: the first and most widely accepted reason is that hate speech is likely to result in actual harm to those who are being targeted (“the incitement to harm” principle): so speech that incites violence against, for instance, people of a particular race, sexual orientation or gender identity is outlawed in most countries, including the USA. Many countries also agree that hate speech that is degrading of groups of people should also be prohibited (“the degrading of groups” principle), because it undermines their status as free and equal members of society. Again, many countries, but notably not the USA, prohibit such forms of hate speech as well. Both freedom of speech and hate speech are concepts that give rise to disagreement, both about their meaning and about how they should be applied.

Publication of information on social media The age of digital media has allowed online speech and content to be shared anonymously and often without a second thought for the consequences. While the act of publishing online is instantaneous, mechanisms designed to regulate speech are often cumbersome and slow.

Moreover, in traditional forms of media, there is editorial oversight from a person other than the author prior to publishing. Historically, this has often provided an effective restraint on hate speech, a mechanism that plainly does not work on self-published social media platforms.

essay on online hate speech

The speed and sheer amount of content, as well as the lack of editorial oversight make social media platforms a particular challenge for regulators. Increasingly, policymakers are suggesting that social media platforms should bear the brunt of the regulatory burden: for instance, through obligations to provide effective complaint mechanisms and remove unlawful speech. The risk with this approach is that lawful speech may be removed in error, or that the general environment will inhibit individuals from expressing themselves online.

Policymakers must ensure that any regulation of social media platforms does not unduly impair freedom of speech.

The four major jurisdictions The United States differs from other jurisdictions being assessed in some important respects. ‘The First Amendment of the US Constitution prohibits the restriction of free speech by government and public authorities. There are narrow exceptions for hate speech, understood as speech that is likely to incite imminent violence.’ The First Amendment however does not prevent private actors, like social media platforms, from imposing their own restrictions on speech. Social media platforms are further protected from private litigation because they are not considered publishers of the content posted to their sites in terms of section 230 of the Communications Decency Act 1996.

essay on online hate speech

The United Kingdom imposes a range of criminal prohibitions on hate speech, both online and in print. The Crime and Disorder Act, Public Order Act, Malicious Communications Act 1998 and Communications Act 2003 prohibit speech that is derogatory on grounds of race, ethnic origin and religious and sexual orientation. A recent White Paper contains sweeping proposals to regulate online media by imposing a duty of care upon social media platforms, and establishing a regulator to ensure that the duty of care is observed. The broad range of companies covered and open-ended list of online harms identified for regulation in the White Paper are a particular concern: it risks overburdening the regulator and leading to highly selective enforcement.

The European Union has adopted the e-Commerce Directive which prevents monitoring of content on websites before it is published, a provision which shapes and impacts the development of regulatory initiatives in Europe. The EU is exploring further options in regulating social media. So far, it has issued a Communication on Tackling Illegal Content Online – Towards Greater Responsibility of Social Media Platforms and has entered into a Code of Conduct on Countering Illegal Content Online with Facebook, Twitter, Youtube, Instagram, Microsoft, Snapchat, Google+ and Daily Motion. In terms of the Code of Conduct, these companies have agreed to take down any illegal content within 24 hours.

essay on online hate speech

The German Network Enforcement Law introduced just over two years ago imposes obligations on social media platforms to establish complaints management mechanisms which must work quickly, transparently and effectively. Where unlawful content (as defined by the German Criminal Code) is identified it must be removed or blocked within a specified deadline. The specific deadline depends on whether content is manifestly illegal, or simply illegal, and whether the social media platform cooperates with a recognised body of industry self-regulation. Fines of up to 50 million Euros can be issued for systemic failings in the complaints management system, including not consistently meeting the required deletion deadlines, and ignoring reporting and transparency requirements.

Future directions Regulating hate speech online is a major policy challenge. Policymakers must ensure that any regulation of social media platforms does not unduly impair freedom of speech. Given the complexity of the problem, close monitoring of new legislative initiatives around the world is necessary to assess whether a good balance has been struck between the protection of freedom of speech and the prohibition of hate speech. In order for this monitoring to take place, social media companies need to be transparent about the content that they are removing and make their data available to researchers and the wider public for scrutiny.

essay on online hate speech

This feature article was created with the approval of the research team featured. This is a collaborative production, supported by those featured to aid free of charge, global distribution.

Want to read more articles like this, sign up to our mailing list and read about the topics that matter to you the most., leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Prevalence and Psychological Effects of Hateful Speech in Online College Communities

Koustuv saha.

Georgia Tech

Eshwar Chandrasekharan

Munmun de choudhury, background..

Hateful speech bears negative repercussions and is particularly damaging in college communities. The efforts to regulate hateful speech on college campuses pose vexing socio-political problems, and the interventions to mitigate the effects require evaluating the pervasiveness of the phenomenon on campuses as well the impacts on students’ psychological state.

Data and Methods.

Given the growing use of social media among college students, we target the above issues by studying the online aspect of hateful speech in a dataset of 6 million Reddit comments shared in 174 college communities. To quantify the prevelence of hateful speech in an online college community, we devise College Hate Index (CHX). Next, we examine its distribution across the categories of hateful speech, behavior, class, disability, ethnicity, gender, physical appearance, race, religion , and sexual orientation . We then employ a causal-inference framework to study the psychological effects of hateful speech, particularly in the form of individuals’ online stress expression. Finally, we characterize their psychological endurance to hateful speech by analyzing their language– their discriminatory keyword use, and their personality traits.

We find that hateful speech is prevalent in college subreddits, and 25% of them show greater hateful speech than non-college subreddits. We also find that the exposure to hate leads to greater stress expression. However, everybody exposed is not equally affected; some show lower psychological endurance than others. Low endurance individuals are more vulnerable to emotional outbursts, and are more neurotic than those with higher endurance


Our work bears implications for policy-making and intervention efforts to tackle the damaging effects of online hateful speech in colleges. From technological perspective, our work caters to mental health support provisions on college campuses, and to moderation efforts in online college communities. In addition, given the charged aspect of speech dilemma, we highlight the ethical implications of our work. Our work lays the foundation for studying the psychological impacts of hateful speech in online communities in general, and situated communities in particular (the ones that have both an offline and an online analog).


Colleges are places where intellectual debate is considered as a key aspect of the educational pursuit, and where viewpoint diversity is venerated. Many colleges in the U.S. have been homes of the free speech movement of the 1960s, that catalyzed positive outcomes, ranging from women’s suffrage movement to civil rights protests [ 72 ]. However, the last few decades has also witnessed several instances where minority groups in colleges have been targeted with verbal altercations, slander, defamation, and hateful speech [ 41 ]. In fact, between 2015 and 2016, there has been a 25% rise in the number of reported hate crimes on college campuses [ 11 ].

Because colleges are close-knit, diverse, and geographically situated communities of students, the harmful effects of hateful speech are manifold. In addition to being a precursor to potential hate crimes and violence, hateful speech and its exposure can have profound psychological impacts on a campus’s reputation, climate, and morale, such as heightened stress, anxiety, depression, and desensitization [ 53 , 87 ]. Victimization, direct or indirect, has also been associated with increased rates of alcohol and drug use [ 79 ]—behaviors often considered risky in the formative college years [ 65 ]. Further, hateful speech exposure has negative effects on students’ academic lives and performance, with lowered self-esteem, and poorer task quality and goal clarity-disrupting the very educational and vocational foundations that underscore college experience [ 17 , 61 ].

Given the pervasive adoption of social media technologies in the college student population [ 69 ] and as students increasingly appropriate these platforms for academic, personal and social life discussions [ 7 ], hateful speech has begun to manifest online [ 19 ]. This adds a new dimension to the existing issues surrounding college speech. It is found to be a key driver of and an exacerbating factor behind harassment, bullying, and other violent incidents targeting vulnerable students, often making people feel unwelcome in both digital and physical spaces [ 48 , 79 ], and even causing psychological and emotional upheavals, akin to its offline counterpart [ 63 , 86 ].

Campus administrators and other stakeholders have therefore struggled with mitigating the negative effects of online hateful speech on campuses, while at the same time valuing students’ First Amendment rights [ 9 , 49 ]. An important step towards addressing existing challenges is to first assess the pervasiveness of online hateful speech and the vulnerability in terms of psychological wellbeing presented to marginalized communities on college campuses. However, present methods of assessments are heavily limited. Most existing reports are anecdotal that are covered in popular media outlets [ 51 ], and are based on discrete events. Again, there is no empirical way to comprehensively and proactively quantify and characterize hateful speech that surface online in student communities. In addition, social confounds such as the stigma of being exposed to, and the psychological ramifications of hate often lead to underestimates of the effects of online hate, further tempering the mitigation efforts that aim to help these very marginalized groups.

To bridge these gaps, this paper leverages an extensive dataset of over 6 million comments from the online communities of 174 U.S. colleges on Reddit to examine the online dimension of hateful speech in college communities, addressing two research questions:

RQ1: How prevalent is hateful speech in online college communities, across the demographic categories such as gender, religion, race?

RQ2: How does exposure to online hate affect an individual’s expression of their psychological state on social media, particularly stress?

Our work operationalizes hateful speech in online college communities on the hateful content posted in these subreddits. We devise College Hate Index (CHX) to quantify the manifestation of hateful speech across various target categories of hate in an online college community. Our findings suggest that, despite several existing moderation policies on college subreddits [ 45 ], hateful speech remains prevalent. Adopting a causal inference framework, we then find that an individual’s exposure to online hateful speech impacts their online stress expression. In fact, when exposed to hate, these individuals show a wide range of stress levels, which we characterize using a grounded construct of psychological endurance to hate. Individuals with lower endurance tend to show greater emotional vulnerability and neuroticism.

Although this work does not capture offline hateful speech on college campuses, it advances the body of research in online hateful speech by examining it in a hitherto under-explored community – college campuses, and by surfacing its psychological effects – a hitherto under-explored research direction. We discuss the implications of our work in providing an important empirical dimension to the college speech debate, and for supporting policy-making and wellbeing support and intervention efforts to tackle the psychological effects of online hateful speech in college communities.

Privacy, Ethics, and Disclosure.

Given the sensitive nature of our study, despite working with public de-identified data from Reddit, we do not report any information that associates hateful speech and its psychological effects with specific individuals or college campuses. To describe our approach and to ground our research better, this paper includes paraphrased and partially masked excerpts of hateful comments, for which we suggest caution to readers.


Hateful speech on college campuses..

Despite being attributed as a form of “words that wound” [ 34 ], hate speech lacks a universally accepted definition. In the specific setting of college campuses, we adopt Kaplin’s definition as a way to operationalize hateful speech in the online college communities [ 49 ]:

..verbal and written words, and symbolic acts, that convey a grossly negative assessment of particular persons or groups based on their race, gender, ethnicity, religion, sexual orientation, or disability, which is not limited to a face-to-face confrontation or shouts from a crowd, but may also appear on T-shirts, on posters, on classroom blackboards, on student bulletin boards, in flyers and leaflets, in phone calls, etc.

College campuses harbor many diverse communities of race, religion, ethnicity, and sexual orientation. Although argued to be “safe spaces” [ 82 ], colleges suffer from many problems related to hate speech, some of which have also escalated to hate crime and violence over the years [ 9 ]. The situation is not only alarming, but also controversial, because U.S. colleges have been unable to successfully regulate hateful speech on campuses based on the long ongoing debate over the freedom of expression per the First Amendment [ 49 ], and hate speech legislation, or the “speech debate” [ 57 ]. Therefore, examining hateful speech in colleges remains a subject of interest from the standpoint of legal, political, and social sciences [ 42 ].

To measure the pervasiveness of hateful speech in colleges, stakeholders have adopted a handful of methodologies. Most of these are based on discrete and subjective reports of personal experiences [ 28 , 39 , 71 ], whose recollection can be unpleasant and traumatizing to the victims. A significant limitation of this approach, is that they generate ‘optimisitic’ estimates—many targets of hateful speech refrain from reporting their experiences for the fear of being victimized, and due to social stigma [ 14 , 53 ].

Researchers have studied hateful speech through crisis reaction model to find that it shows similar three-phase consequences of feelings (affect), thoughts (cognition), and actions (behavior) as other traumatic events [ 53 ]. Further, the victims of hateful speech experience psychological symptoms, similar to post-traumatic stress disorder, such as pain, fear, anxiety, nightmares, and intrusive thoughts of intimidation and denigration [ 58 , 87 ]. Some early work also outlined that prejudice, discrimination, intolerance, hatred, and factors hindering a student’s integration into their social and academic environments can lead to stress and lowered self-esteem among minorities in college campuses, even if they are not the direct victims of specific events [ 17 , 61 , 79 ]. However, assessing the psychological impacts of exposure to hateful speech on college campuses is challenging and has so far been unexplored at scale.

As many of students’ discussions have moved online and many social media platforms provide open forum of conversation to students [ 69 , 75 ], these tools have also paved the way for speech that is usually reserved for the edges of society. In fact, many incidents of hateful speech on campuses, that are targeted at marginalized groups, have recently been reported to have been initiated online [ 79 ]. Assessing the repercussions of online hateful speech has been challenging, for the same reasons as its offline counterpart. Our work addresses the above noted gaps by utilizing unobtrusively gathered social media data from online college communities to estimate the pervasiveness of online hateful speech, and how it psychologically impacts the exposed individuals.

Online Hateful Speech and Its Effects.

Online hateful speech differs from its offline counterpart in various ways, as a consequence of affordances of online platforms, such as anonymity, mobility, ephemerality, size of audience, and the ease of access [ 15 ]. Under the veil of (semi)-anonymity, and the ability to exploit limited accountability that comes with anonymous online activity, perpetrators receive reinforcement from like-minded haters, making hatred seem normal and acceptable [ 12 , 80 ].

However, both online and offline hateful speech are sometimes inter-related with regards to their causes and effects. For instance, Timofeeva studied online hate speech and additional complexities that it brings to the constitutional right to free speech, and Olteanu et al. demonstrated that offline events (e.g., extremist violence) causally stimulate online hateful speech on social media platforms like Twitter and Reddit [ 64 , 88 ]. Other work studied the propagation of online hateful speech following terrorist incidents [ 16 ].

Over the past few years, a number of studies have focused on detecting and characterizing hateful speech [ 46 , 81 ], such as distinguishing hateful speech from other offensive language [ 30 ], annotating hateful posts on Twitter based on the critical race theory [ 90 ], and conducting a measurement study of hateful speech on Twitter and Whisper [ 59 ]. Recently, ElSherief et al. studied the distinctive characteristics of hate instigators and targets on social media in terms of their profile self-presentation, activities, and online visibility, and Cheng et al. explored the relationship between one’s mood and antisocial behavior on online communities [ 23 , 38 ]. Other research has also studied moderation of online antisocial behaviors like undesirable posting [ 18 , 24 ] and online abuse [ 13 , 21 , 47 ].

Apart from understanding online hateful language, some, although limited studies have also examined its effects on the online activities of individuals [ 5 ]. [ 48 ] showed that victims of online abuse leave the platforms, [ 86 ] found that the victims feel increased prejudice, and [ 19 ] found that the ban of Reddit communities which incited hateful content was effective towards reducing the manifestation of hateful content on the platform. Similarly, other work found that exposure to online hate among young social media users is associated with psychological and emotional upheavals and heightened distancing from family members [ 63 ].Further, [ 91 ] studied how various minority groups are targeted with hate speech through various modes of media (both online and offline) and how they are affected because of the exposure to hateful content. Our study advances this critical, yet relatively under-explored line of research by examining how the exposure to online hateful speech can psychologically affect the exposed users, or students in our particular setting of online college communities.

Social Media and Psychological Wellbeing.

Psychology literature established that analyzing language helps us understand the psychological states of an individual [ 68 ]. Several studies have showed that social media data can help us infer and understand the psychological and mental health states of individuals and communities [ 27 , 31 , 74 ]. ,Prior work has also used social media to analyze personality traits and their relationship to wellbeing [ 70 , 83 ]. Social media data has also facilitated psychological assessments in settings where survey-based assessments are difficult, due to the sensitivities of the situations [ 33 , 75 ].

Pertaining to the population of college students, Ellison et al. in their seminal work, found positive relationship between social media use and maintenance of social capital [ 37 ], and Manago et al. found that social media helped college students to satisfy enduring psychosocial needs [ 55 ]. Given the ubiquity of social media use among youth [ 69 ], and because social media platforms enable them to share and disclose mental health issues [ 35 ], researchers have also leveraged social media as an unobtrusive source of data to infer and understand mental health and wellbeing of college students [ 54 , 56 ]. Of particular relevance are two recent pieces of work: Bagroy et al., who built a collective mental health index of colleges employing social media (Reddit) data [ 7 ], and Saha and De Choudhury, who used college subreddit data to study the evolution of stress following gun violence on college campuses [ 75 ].

Although these studies provide us with a foundational background, it remains largely understudied how online community dynamics, such as the exposure to hateful speech affects psychological climate of college campuses. Drawing on the recent success of causal analyses in social media research related to both online hateful speech [ 19 , 64 ], and mental health [ 32 , 76 , 78 ], we focus on a specific online community behavior (hateful speech in online college communities), and examine its psychological impacts on the online expression of stress of community members.

Online College Community Dataset.

Reddit, the source of data in this paper, is one of the most popular social media platforms which caters to the age group between 18-29 years: 65% of Reddit users are young adults [ 69 ]. We note that this age demography aligns with the typical college student population, making Reddit a suitable choice for our study. Further, Reddit is a social discussion website which consists of diverse communities known as “subreddits” that offer demographical, topical, or interest-specific discussion boards. Many colleges have a dedicated subreddit community, which provides a common forum for the students to share and discuss about a variety of issues related to their personal, social, and academic life (see e.g., [ 7 , 75 , 78 ]). In fact, the college subreddits name themselves after the college communities that they represent and they often customize their pages with college logos and campus images to signal their identity.

These observations, taken together, indicate that college communities on Reddit can be a source of data to study the research questions posed in this paper. Moreover, such a subreddit dataset has been leveraged in a number of prior work surrounding the study of online college communities [ 7 , 75 , 78 ]. Notably, Bagroy et al. showed that this Reddit data adequately represents the rough demographic distribution of the campus population of over 100 U.S. colleges, is sufficiently widely adopted in these college campuses, and can be employed as a reliable data source to infer the broader college communities’ mental wellbeing [ 7 ]. While college students likely use other social media platforms as well, such as Facebook, Twitter, Instagram, and Snapchat, obtaining college-specific data from these sources is challenging because many of these platforms restrict public access of data, and they lack defined community structures, precluding gathering sufficiently representative data for specific college campuses. Moreover, these other platforms introduce difficulties in identifying college students users and their college-related discussions on the respective platforms, unless they self-identify themselves, which can limit both scalability and generalizability. In the following subsection we describe how we identify and collect data from college subreddits.

Data Collection.

We began by compiling a list of 200 major ranked colleges in the U.S. by crawling the U.S. News ( usnews.com ) website. Next, we crawled the SnoopSnoo ( snoopsnoo.com ) website, which groups subreddits into categories, one of which is “Universities and Colleges”. For 174 of these 200 colleges, we found a corresponding subreddit. As of December 2017, these subreddits had 3010 members on an average, and the largest ones were r/UIUC, r/berkeley, r/aggies, r/gatech, r/UTAustin, r/OSU , and r/ucf with 13K to 19K members.

Next, we built our dataset by running nested SQL-like queries on the public archives of Reddit dataset hosted on Google BigQuery [ 1 ]. Our final dataset for 174 college subreddits included 5,884,905 comments, posted by 453,781 unique users between August 2008 and November 2017. Within this dataset, 4,144,161 comments were posted by 425,410 unique users who never cross-posted across subreddit communities. Students seek and share information and opinion on a variety of topics spanning across academics, partying, leisure, relationship, emotional support, and other miscellaneous aspects of college life in particular, and youth life in general.


4.1. operationalizing hateful speech.

A first step in our work revolves around identifying hateful speech in the comments posted on the college subreddits. We adopt a pattern (keyword) matching approach by using a high-precision lexicon from two research studies on hateful speech and social media [ 30 , 59 ]. This lexicon was curated after multiple iterations of filtering through automated classification, followed by crowdsourced and expert inspection. It consists of 157 phrases that are categorized into: behavior, class, disability, ethnicity, gender, physical, race, religion, sexual orientation , and other .

Motivation and Validity.

Using this lexicon suits our work because we require aggregative assessment of the prevalence of hateful speech— we do not exclusively focus on detecting individual instances of hateful commentary or the specific victims of hate. A lexicon matching approach casts a wider net on all possible manifestations of online hateful speech, compared to supervised learning based detection techniques which are more tuned to keep false positives at a minimum when incorporated in automatic moderation.

Additionally, we frame our reasoning behind the choice of this approach with validity theory [ 29 ]. First, since we operationalize hate speech by using this validated, crowdsourced, and expert-annotated lexicon, developed and used in prior work, it offers strong face and construct validity . This lexicon was compiled on hateful words reported by users on the web; thus it offers a better mechanism to capture the subjective interpretation of hate speech, than bag of words based machine learning approaches. From a convergent validity perspective, lexicon approaches have performed as good as sophisticated approaches in hate speech detection [ 30 , 59 ].

This approach is inclusive, using a rich set of cues covering several forms of online hate, it offers rigor in content validity , like in prior work [ 19 , 64 ], [ 19 ] used lexicon of as few as 23 phrases to measure hate speech on Reddit. Content validity is valuable here because, unlike most work, our goal is not to detect if a post is hateful for moderation, but to get a collective sense of hatefulness in an online community and to support cross-college community comparisons. Finally, we also manually annotated a random sample of 200 college subreddit comments to check concurrent validity of the approach. Two researchers familiar with the literature on online hateful speech, independently rated if using the lexicon-based approach, these comments were correctly identified to have hateful content. We found a Cohen’s κ of 0.8, suggesting a strong agreement on the comments identified to have evidence of hateful speech and those manually rated.

Using the above hate lexicon, for every subreddit in our dataset, we obtain a normalized occurrence of hateful speech, given as the fraction of keywords that matched the lexicon, to the total number of words in the subreddit’s comments. We obtain both category-specific and category-aggregated measures of hateful speech given in the lexicon.

4.2. College Hate Index (CHX)

Next, we discuss the computation of CHX using the above normalized measure of hate in comments. We first identify five subreddits, which were banned by Reddit primarily due to severe hateful speech usage: r/CoonTown, r/fatpeoplehate, r/KikeTown, r/nazi, r/transf*gs [ 19 , 62 ]. These subreddits glorified hateful speech against certain groups. For example, r/CoonTown which grew over 15,000 subscribers self-described itself as “a noxious, racist corner of Reddit” [ 60 ]. Our motivation to collect this data stems from the conjecture that hateful speech in these banned subreddits would serve as an upper bound to the amount of hateful speech in any other subreddit (such as the 174 college subreddits, none of which were banned at the time of writing this paper). Accordingly, CHX is a measure to calibrate and standardize the prevalence of hateful speech in a college subreddit, allowing aggregative analysis as well as cross subreddit comparison.

Using the same data collection strategy as explained in the Data section, we collect 1,436,766 comments from the five banned subreddits mentioned above. Then, per hate category in our hate lexicon, we compute category-specific and category-aggregated normalized occurrences of hate keywords in the comments of banned subreddits using the method described above. Together with the normalized measures of hate in college subreddits, we define CHX of an online college community to be the ratio of the normalized hate measure (category-specific or category-aggregated) in the college subreddit to the same measure in banned subreddits:

S is a college subreddit, B denotes banned subreddits, T indicates type of hate speech assessment: category-specific or category-aggregated, P T ( S ) and P T ( B ) respectively denote the normalized occurrence of hate keywords for T in S and B . For category-aggregated CHX, T includes all hate keywords, and for category-specific CHX, it includes category-specific ones.

Based on the above equation 1 , a college subreddit with no hate shows CHX of 0, whereas if its hateful speech prevalence matches that in banned subreddits, it shows a CHX of 1. Note that, practically speaking, in a college subreddit the normalized occurrence of hate words can exceed that in the banned subreddits. However, it is less likely based on our reasoning above; thus, we cap the maximum value of CHX at 1, allowing us to bound it in the [0, 1] range.

4.3. Measuring the Prevalence

We find that hateful speech in college subreddits is non-uniformly distributed across the different categories of hate ( Figure 1a ). A Kruskal-Wallis one-way analysis of variance reveals significant differences in the category-specific occurrences of hate ( H = 1507, p < 0.05). Among the hate categories, Other (mean CHX=0.9) and behavior (mean CHX=0.8) show the highest occurrence in college subreddits. While hateful speech targeted at ethnicity, race, and religion have been a major concern for many college campuses [ 49 ], we observe varied distribution of online hate for these categories. E.g., CHX for race ranges between 0.01 and 0.10, for ethnicity it ranges between 0 and 0.70, and for religion it ranges between 0.01 and 1.00. Hateful speech towards disability ranges between 0 and 0.57, and it shows lower average prevalence (mean CHX = 0.08) than all other categories except race (mean CHX = 0.05). This observation aligns with a prior finding in the offline context that schools and colleges show comparably lower disability targeted hatefulness compared to non-disability targeted hate [ 85 ].

An external file that holds a picture, illustration, etc.
Object name is nihms-1625892-f0001.jpg

(a) Distribution of category-specific CHX; (b) Histogram of category-aggregated CHX over college subreddits; (c) Kernel Density Estimation of hate lexicon’s absolute log-likelihood ratio (LLR) distribution in banned (bn.) against college (clg.) and non-college (alt.) subreddits.

Table 1 reports paraphrased comment excerpts that occur per hate category in the college subreddits. The Other category, that demonstrated the highest prevalence, includes keywords like “indecisive”, “drunk”, and “uneducated”. When we examined a random sample of comments, we found that these words are frequently used by the authors to target other members of the community or even the college community in general, e.g., “They admit gifted students with bright futures but produce uneducated hobos who can’t get a job and rely on State alumni for welfare.” .

Excerpts of paraphrased snippets per hate category in the college subreddit dataset.

At an aggregate level, we find that hateful speech in college subreddits is indeed prevalent and ranges between 0.26 and 0.51 (mean=0.37; stdev.=0.05) (see Figure 1b ). We find that there are no college subreddits with CHX above 0.51; this reveals reasonable civility in these communities, unlike the banned ones. However, the fact that there are no college subreddits at all with CHX below 0.26 indicates the pervasiveness of the phenomenon.

4.4. Comparison with Non-College Subreddits

Having established the prevalence of hateful speech in online college communities, it raises a natural question: how does this prevalence compare against hateful speech that is manifested elsewhere on Reddit? To answer this, we identify 20 subreddits (alt. subreddits hereon) from the landing page of Reddit, which harbor a diversity of interests and are subscribed by a large number of Reddit users (e.g., r/AskReddit, r/aww, r/movies) . From these, we collect a random sample of 2M comments (100K comments per subreddit), and using the same strategy to measure the prevalence of hateful speech (as CHX), we calculate the hate index in these subreddits at an aggregate level, and find it to be 0.40. This shows that although a majority of the online college communities reveal lower CHX ( Figure 1b ), over 25% of them have greater hateful speech than the average prevalence in non-college subreddits.

We further investigate the above distinction in prevalence of hateful speech in college subreddits through a log-likelihood ratio distribution. For every word in our hate lexicon, we calculate their standardized occurrence in the banned, alt., and college subreddits. Then, taking banned subreddits as the common reference, we calculate these keywords’ absolute log-likelihood ratios (LLR) in college and alt. subreddits. An absolute LLR of a keyword quantifies its likelihood of presence in either of the two datasets, i.e, lower values of LLR (closer to 0) suggests comparable likelihood of occurrence, whereas higher values of LLR (closer to 1) suggests skewness in the occurrence of a lexicon keyword in either of the two datasets (banned subreddit and college or alt. subreddit).

Figure 1c shows the kernel density estimation of hate keywords’ absolute LLR distribution in banned subreddits against college and alt. subreddits. An independent-sample t-test confirms the statistical significance in their differences ( t = −54.95,p < 0.05). We find that the mean absolute LLR of hate lexicon in banned and college subreddits (mean = 0.49) is lower than that in banned and alt. subreddits (mean = 0.78). This suggests that a greater number of hate keywords show a similar likelihood of occurrence in college subreddits as their occurrence in the banned subreddits.


Recall that our RQ2 asks whether and how the hatefulness in college subreddits affects the psychological state of the community members. To first operationalize psychological state of these online communities, we refer to prior literature that shows that hateful speech is associated with emotional upheavals and distress [ 58 , 87 ], with stress being one of the most prominent responses in those exposed to hate both directly and indirectly. We approach RQ2 by first quantifying the extent of hate exposure of an individual in the college subreddits, and then measuring the same individuals’ online stress. Eventually, we employ a causal inference framework, drawing from Rubin’s causal model [ 43 ], to explore the causal link between exposure to hateful speech and stress expression.

5.1. Defining and Quantifying Hate Exposure

Without the loss of generality, we define hate exposure for an individual to be the volume of hateful words shared by others that they are exposed to as a result of participation via commentary in a college subreddit. We calculate this exposure per user as an aggregated percentage of hateful words used by others on all the threads the user has participated in. We use the same lexicon of hate keywords as described in the previous section.

We note that this is a conservative definition of online hate exposure, because individuals can be exposed without commenting on a thread with hateful speech; for instance, by simply browsing such a thread. Exposure may also have offline or spill over effects, such as offline hateful expressions whose effects can get amplified when an individual engages with similar content online. However our definition yields a high precision dataset of exposed users, as commentary explicitly signals that individuals have almost certainly consumed some of the hateful content shared by others in a thread.

Further, through this definition of exposure, we choose to not restrict our analysis only to the intended individual targets of hateful speech, but rather to examine the effects of hateful speech within college subreddits more broadly, at a community-level. Since college subreddits have an offline analog—the offline community on campus, our choice for this broader definition of “exposure” is also inspired by prior psychology literature which revealed that a toxic (or negative) environment can affect individuals in various forms of presence or relatedness [ 67 ].

5.2. Stress Expressed in College Subreddits

Our next objective is to quantify that user’s online stress expression, with the psychologically grounded assumption that stress is a manifestation of their psychological state. For this, we appropriate prior work that demonstrated that online stress expression can be measured from content shared in the college subreddits [ 75 , 77 ].

Specifically, we reproduce a supervised learning based stress detector (classifier) from [ 75 ]. This classifier (a support vector machine model with a linear kernel) employs a supervised learning methodology [ 66 ] on a Reddit dataset comprising 2000 posts shared on a stress disclosure and help seeking subreddit, r/stress (positive ground truth examples or High Stress), and another 2000 posts obtained from Reddit’s landing page that were not shared in any mental health related subreddit (negative examples or Low Stress). Using n -grams and sentiment of the posts as features and based on k -fold ( k = 5) cross-validation, the classifier predicts a binary stress label (High or Low Stress) for each post with a mean accuracy and mean Fl-score of 0.82. This classifier was expert-validated using the Perceived Stress Scale [ 26 ] (expert validation accuracy = 81%) on college subreddit data like ours [ 75 ]. Similar supervised learning approaches have also been recently used in other work to circumvent the challenges of limited ground-truth [ 7 , 78 ].

In our case, first, applying this stress classifier, we machine label the 4,144,161 comments in our dataset as high and low stress. Then we aggregate the labeled posts per user for the 425,410 users, to assess their online stress expression. Example comments labeled high stress in our dataset include, “That sounds very challenging for me. I am a CS major”, “College can be very tough at times like this.”, “Got denied, but I had to act, I’m very disappointed” .

5.3. Matching For Causal Inference

Next, we aim to quantify the effects of exposure to hateful speech with regard to the stress expressed by users in the college subreddits. This examination necessitates testing for causality in order to eliminate (or minimize) the confounding factors that may be associated with an individual’s expression of stress. Ideally such a problem is best tackled using Randomized Controlled Trials (RCTs). However, given that our data is observational and an RCT is impractical and unethical in our specific context involving hateful speech exposure and an individual’s psychological state, we adopt a causal inference framework based on statistical matching. This approach aims to simulate a randomized control setting by controlling for observed covariates [ 43 ]. For our problem setting, we “match” pairs of users using the propensity score matching technique [ 43 ], considering covariates that account for online and offline behaviors of users.

5.3.1. Treatment and Control Groups, and Matching Covariates.

We define two comparable cohorts of users who are otherwise similar, but one that was exposed to hateful speech ( Treatment group) whereas the other was not ( Control group). To obtain statistically matched pairs of Treatment and Control users, we control for a variety of covariates such that the effect (online stress) is examined between comparable groups of users showing similar offline and online behaviors: 1) First, we control for users within the same college subreddits , which accounts for offline behavioral changes attributable to seasonal, academic calendar, or local factors [ 75 ]. 2) Next, we account for the user activity on Reddit with covariates, per prior work [ 19 , 78 ], which includes the number of posts and comments, karma (aggregated score on the user’s posts and comments), tenure (duration of participation) in the community. 3) Finally, to minimize the confounding effects of latent factors of those associated with an individual’s stress, we limit our analysis in the period after 2016, and among those 217,109 users who participated in discussion threads both before and after 2016. Note that our choice of 2016 hinges on the notion that it enables us roughly 2 years of data for our causal analysis, which is half of the typical period of undergraduate education (4 years). This enables us to obtain a baseline stress and a baseline hate exposure of every user, which are obtained from the comments posted (shared and encountered) before 2016. This baseline stress measures allows us to account for the fact that the psychological wellbeing of an individual can be impacted by both intrinsic and extrinsic historical factors.

5.3.2. Matching Approach.

We use the propensity score matching technique [ 43 ] to match 143,075 Treatment users with a pool of 74,034 users who were not exposed to any hate on the college subreddits in the period from January 2016 to November 2017. First, we train a logistic regression classifier that predicts the propensity score ( p ) of each user using the above described covariates as features. Next, for every Treatment ( T i ) user, we find the most similar Control user, conditioning to a maximum caliper distance ( c ) (with α = 0.2), i.e., | T i ( p ) – ¬ T i ( p ) |≤ c , where c = α * σ pooled (σ pooled is the pooled standard deviation, and α ≤ 0.2 is recommended for “tight matching” [ 6 ]). Thereby, we find a matched Control user for each of the 143,045 Treatment users.

5.3.3. Quality of Matching.

To ensure that our matching technique effectively eliminated any imbalance of the covariates, we use the effect size (Cohen’s d ) metric to quantify the standardized differences in the matched Treatment and the Control groups across each of the covariates. Lower values of Cohen’s d imply better similarity between the groups, and magnitudes lower than 0.2 indicates “small” differences between the groups [ 25 ]. We find that the Cohen’s d values for our covariates range between 0.001 and 0.197, with a mean magnitude of 0.07 suggesting a good balance in our matching approach (see Figure 2a ). Finally, to eliminate any biases in our findings due to the differences in the degree of participation, we also validate whether the matched pairs of users were exposed to similar quantity of keywords in our period of analysis (post 2016). For the number of keywords they were exposed to, the two cohorts of matched users ( Treatment and Control ) show a Cohen’s d of 0.02, suggesting minimal differences in their exposure to comment threads or their degree of participation in college subreddits.

An external file that holds a picture, illustration, etc.
Object name is nihms-1625892-f0002.jpg

a) Cohen’s d for evaluating matching balance of covariates of activity features and baseline (B.) stress and hate exposure; b) Kernel Density Estimation of user distribution with change in stress expression.

We further assess the similarity in topical interests between the commenting behavior of Treatment and Control pairs of users. Here a high value of topical similarity would ascertain minimal confounds introduced due to topical differences (such as high stressed users being more interested in hateful threads). We adopt a word-embedding based similarity approach [ 8 , 75 ], where for every user, we obtain a word-embedding representation in 300-dimensional vector space of all the subject titles of the discussion threads that they commented on. We choose subject titles because of their prominence on the homepage of a subreddit, and they likely influence users to consume and subsequently comment on the thread. Next, we compute the vector similarity of the subject titles’ word-vectors for every pair of Treatment and Control users, which essentially quantifies their topical interests. Across all the pairs of Treatment and Control users, we find an average cosine similarity of 0.67 (stdev. = 0.17), indicating that our matched users share similar interests in the posts on which they commented on.

5.4. Does Hate Exposure Impact Stress Level?

Following statistical matching, we examine the relationship between the exposure to hate and the expression of stress in college subreddits. Drawing on the widely adopt “Difference in Differences” technique in causal inference research [ 2 ], we evaluate the effects of hate exposure on stress by measuring the shifts in online stress for the Treatment group and comparing that with the same in the Control group. According to Rubin’s causal framework, such an evaluation averages the effect (online stress expression) caused by the treatment (online hate exposure) on the treated individuals by comparing that with what the same individuals would have shown had they not been treated (the individual’s matched pair) [ 43 ].

We observe that compared to their baseline stress, the stress level of the Treatment users (mean=139%) is higher than the Control users (mean=106%). An effect size measure (Cohen’s d =0.40) and a paired t-test indicates this difference to be statistically significant ( t =93.3, p < 0.05). Figure 2b shows the changes in stress level for the two user groups Treatment and Control users subject to hate speech exposure in the college subreddits. Given that these two groups are matched on offline and online factors, such revealing differences in stress between them following online hate exposure suggest that this exposure likely has a causal relationship with the online stress expression of the users.

Now that we demonstrated that online hate exposure plausibly influences the online stress expression of individuals in college subreddits, we are next interested in how the various categories of hate leads to shifts in online stress expression among the Treatment users. For this, we fit a linear regression model with the hate categories as independent variables and the change in stress expression as the dependent variable. Table 2 reports the coefficients of these categories in the regression model where all of them showed statistical significance in their association. These coefficients could be interpreted as—every unit change in online hate exposure from a category leads to an approximate change in online stress expression by the magnitude of the corresponding coefficient. We find that each of the hate categories shows a positive coefficient, further indicating that an increase in exposure to any category of hate increases the stress expression of members of the college subreddits. Among these categories, we find that gender (0.81%) and disability (0.73%) show the greatest coefficients, and therefore affect most towards the online stress expression of the community members.

Regression coefficients for hate categories and change in stress expression (*** p < 0.001).

5.5. Psychological Endurance to Hate Exposure

Within our Treatment group, we observe that users are not equally affected in their stress levels. In fact, they show a wide range of online stress (median = 0.05, stdev. = 0.80) at varying magnitudes of online hate exposure (median = 0.68, stdev. = 3.61) (see Figure 3 ). So, besides observing that hate exposure in these communities bears a causal relationship with online stress expressions, we also find that online hate does not affect everybody’s stress expression uniformly. This aligns with the notion that individuals differ in their resilience to the vicissitudes of life [ 53 ]. We call this phenomenon of varied tolerance among users as the psychological endurance to online hateful speech. Our motivation to examine this endurance construct comes from the psychology literature, which posits that different people have different abilities to deal with specific uncontrollable events, and stress results from the perception that the demands of these situations exceed their capacity to cope [ 44 ].

An external file that holds a picture, illustration, etc.
Object name is nihms-1625892-f0003.jpg

Dist. Tr . users.

To understand psychological endurance to online hate, we look at two groups of users who express the extremes of online stress at the opposing extremes of online hate exposure. One group comprises those Treatment users with low endurance who have lower tolerance to online hate than most other users and show high (higher than median) stress changes when exposed to low (lower than median) online hate (quadrant 4 in Figure 3 ). The other group consists of those users who have much higher tolerance, and show low (lower than median) stress changes when exposed to high (higher than median) hate (quadrant 2 in Figure 3 ). We refer to these two groups as low endurance and high endurance users—we find 38,503 low and 38,478 high endurance users in our data.

5.6. Analyzing Psychological Endurance

Our final findings include an analysis of the attributes of high and low endurance users as manifested in the college subreddits. We focus on two kinds of attributes— users’ online linguistic expression, and their personality traits as inferred from their language. Given that we distinguish the psychological behaviors of two cohorts (individuals with low and high endurance to hateful speech), the choice of these attributes stem from prior work that studied psychological traits and states of individuals as gleaned from their social media activity [ 22 ].

Linguistic Expression.

To understand in what ways the low and high endurance users differ in language use, we employ an unsupervised language modeling technique, Sparse Additive Generative Model (SAGE) [ 36 ], that has been widely applied in computational linguistic problems on social media data [ 21 , 77 , 84 ]. Given any two documents, SAGE selects discriminating keywords by comparing the parameters of two logistically parameterized multinomial models, using a self-tuned regularization parameter to control the tradeoff between frequent and rare terms. We use the SAGE model to identify discriminating n -grams ( n =1,2) between the comments of low and high endurance users. The magnitude of SAGE value of a linguistic token signals the degree of its “uniqueness”, and in our case a positive SAGE more than 0 indicates that the n -gram is more representative for the low endurance users, whereas a negative SAGE denotes greater representativeness for high endurance users.

Table 3 reports the top 25 n -grams ( n = 1,2) for low and high endurance users. One pattern evident in these n -grams is that low endurance users tend to use more classroom-oriented and academic-related topics, such as “education”, “prerequisite”, “assessment”, “mathematical”, etc.. Whereas, the high endurance group demonstrates greater usage of words that relate to a more relaxed and leisure-like context, as well as to diverse non-academic topics/interests, such as “pokemon”, “guitar”, “delicious”, “anime”, and “garden”. We also find mental health related terms such as “therapy” and “anxiety” for low endurance users, which can be associated with these users self-disclosing their condition or with their helpseeking behaviors around these concerns.

Top 25 discriminating n -grams ( n = 1, 2) used by Low and High Endurance users (SAGE [ 36 ]).

Personality Traits.

Our final analysis focuses on understanding the personality differences between individuals showing varied levels of psychological endurance to online stress. Personality refers to the traits and characteristics that uniquely define an individual [ 83 ]. Psychology literature posits personality traits as an important aspect to understand the drivers of people’s underlying emotional states, trust, emotional stability, and locus of control [ 3 ]. For instance, certain personality traits, such as extraversion and neuroticism, represent enduring dispositions that directly lead to subjective wellbeing in individuals, including the dimensions of happiness and negative affect [ 3 ]. We study relationship of psychological endurance with personality traits, which can be inferred from social media data of the users [ 70 , 83 ].

To characterize the personality traits of the users who show low and high psychological endurance, we run their comments’ dataset through the Watson Personality Insights API [ 4 ], to infer personality in five dimensions of traits: openness, agreeableness, extraversion, neuroticism , and conscientiousness . Prior research has used this method to extract and characterize several linguistic and psychological constructs from text [ 22 , 38 ]. Figure 4 shows the distribution of personality traits for low and high endurance users. Paired t -tests revealed statistically significant differences between the two groups. Drawing from seminal work on the big-five factor structure of personality [ 40 ], we situate our observations as follows:

An external file that holds a picture, illustration, etc.
Object name is nihms-1625892-f0004.jpg

Personality traits in Tr . users Stat. significance reported after Bonferroni correction on independent sample t -tests (*** p < 0.001).

We observe that the high endurance group reveals 2% greater agreeableness (t =−66.31) and extraversion (t =−42.62). Agreeableness characterizes an attribute of being well-mannered, and those who show higher values are generally considered to be less reactive to challenges or an attack (here online hateful speech). Extraversion indicates greater sociability, energy, and positive emotions, and lower values signal a reflective personality, which suggests that even lower exposure to online hate can negatively impact users with low endurance. The low endurance users also show 4% greater neuroticism (t =89.42) and conscientiousness (t =109.31). Neuroticism indicates the degree of emotional stability and higher values signal increased tendency to experience unpleasant emotions easily. Despite these posthoc conjectures, we do acknowledge that understanding these relationships between endurance and personality would require deeper investigation beyond the scope of this paper.

Based on these observations and what we already found in SAGE analysis of the low and high endurance users ( Table 3 ), we infer that even with comparable hate exposure in the college subreddits, different individuals may respond psychologically differently, and these differences may be observed via attributes such as their language of expression on social media and their underlying personality traits.


6.1. socio-political and policy implications.

The speech debate has been a part of the American socio-political discussions for many years now [ 57 ]. In particular, on college campuses, it presents many complexities in decision or policy making that seeks to combat hateful speech within campuses [ 49 ]. While this paper does not provide any resolution to this debate, it makes empirical, objective, and data-driven contributions and draws valuable insights towards an informed discussion on this topic.

First, while the speech debate so far has largely focused in the offline context, as our study shows, hateful speech in the online domain also bears negative impacts on the exposed population, especially in situated communities like college campuses. Our findings align with prior work on the psychological impacts on hateful speech in the offline context [ 58 , 87 ]. At the same time, they extend the literature by showing that there are not only pronounced differences in the prevalence of various hate speech types, but also the exposure to hate affects individuals’ online stress expression. Here we note that antisocial behaviors like hateful speech continue to be a pressing issue for online communities [ 19 , 23 ], but the effects of online hateful speech remains the subject of little empirical research. Thus, these findings help to account for a previously under-explored, but a critical facet of the speech debate, especially in the context of college campuses.

Second, our findings add new dimensions to the college speech debate centering around legal, ownership, and governance issues. These issues involve not only those who trigger and those who are exposed to online hateful speech, but also the owners, the moderators, the users, and the creators of social media platforms, who may not necessarily be part of the college community.

Third, our work highlights a new policy challenge: how to decipher when online and offline hateful speech reinforce each other, and to delineate their psychological effects, particularly in a situated community where there is likely overlap between the online and offline social worlds. Our work indicates that the affordances of social media, such as anonymity and low effort information sharing amplify the complexities of the speech debate on college campuses. Typically colleges can choose to restrict the time, place, and manner of someone’s speech. However, when speech is not physically located on campus, these affordances can be exploited to quickly reach large segments of the campus population, posing new threats. Consequently, how should college stakeholders respond, when, based on our approach, a student is found to use online hate, observably outside of the physical setting of the campus, targeting a person of marginalized identity?

Finally, our work opens up discussions about the place of “counterspeech” in these communities to undermine the psychological effects of hate, alongside accounting for the legal concerns and governance challenges that enforcing certain community norms may pose [ 73 ]. We do note that any such discussions promoting counter speech would need to factor in the general etiquette of conduct expected from the members of the college community to avoid misethnic or chauvinistic phrasing, and to maintain a vibrant and inclusive environment which is respectful of other members [ 50 ].

6.2. Technological Implications

An important contribution of our work is a computational framework to assess the pervasiveness and the psychological effects of online hateful speech on the members of online college communities. These methods can lead to two types of technology implications:

6.2.1. Mental Health Support Provisions on College Campuses.

The ease of interpretation and the ability to track language changes over time allows our empirical measure of online hateful speech in college campuses to be robust and generalizable across different online college communities, and also accessible to various stakeholders, unlike what is supported by existing hate speech detection techniques [ 81 ]. Our methods can thus be leveraged by college authorities to make informed decisions surrounding the speech dilemma on campuses, promote civil online discourse among students, and employ timely interventions when deemed appropriate. While our approach to assess the prevalence of hateful speech is likely to be not perfectly accurate, alongside human involvement in validating these outcomes, timely interventions to reduce the harmful effects of online hateful language can be deployed. As Olteanu et al. [ 64 ] recently pointed out that exogenous events can lead to online hatefulness, our framework can assist to proactively detect the psychological ramifications of online hate at their nascent stage to prevent negative outcomes.

Additionally, our work helps us draw insights about the attributes of individuals with higher vulnerability and lower psychological endurance to online hateful speech. This can assist in instrumenting tailored and timely support efforts, and evidence-based decision strategies on campuses. We further note that, any form of hateful speech, whether online or offline, elicits both problem- and emotion- focused coping strategies, and the victims of hateful speech seek support [ 53 ]. Many colleges already provide various self, peer, and expert-help resources to cater to vulnerable students. These efforts may be aligned to also consider the effects of online hateful speech exposure as revealed in our work.

6.2.2. Moderation Efforts in Online College Communities.

Our findings suggest that hateful speech does prevail in college subreddits. However, unlike most other online communities, banning or extensively censoring content on college subreddits—a strategy widely adopted today [ 19 , 62 ] as a measure to counter online antisocial behavior can have counter-effects. Such practices would potentially preclude students from accessing an open discussion board with their peers where not only many helpful information is shared, but also which enables them to socialize and seek support around academic and personal life related topics. Rather, our work can be considered to be a “call-to-action” for the moderators to adopt measures that go beyond blanket banning or censorship. For instance, our approach to assess the stress and hate exposure of users can assist moderators to tune community environment and adapt norms in a way that discourages hateful speech. This could be subreddit guidelines that outline moderation strategies not only discouraging offensive and unwelcoming content, but also around content that affects community members. For example, the subreddit r/lifeprotips explicitly calls out against “tips or comments that encourage behavior that can cause injury or harm to others can cause for a (user) ban” . Other moderation strategies can also be adopted: such as using labels in specific posts which are perturbing, along the lines of r/AskReddit which uses “[Serious]” to particularly label very important and relevant discussion threads.

Moderators can also provide assistance and support via peermatching, and include pointers to external online help resources, especially to members who are vulnerable to the negative psychological impacts of online hateful content. Complementarily, as argued in recent research [ 12 ], making the harmful effects of hateful language transparent to the community members in carefully planned and strategic manner, could curb the prevalence of antisocial practices including hateful speech. Specifically in online college communities, where the members are geographically situated and embedded in their offline social ties [ 10 , 37 ], knowledge of the negative psychological repurcussion of certain online practices could influence them to refrain from or not engage with such behaviors.

In the offline context, the college speech debate has also aroused discussions surrounding safe spaces : particular sites of campuses where students join peers, and trigger warnings: explicit statements that certain material discussed in an academic environment might upset sensitive students [ 50 , 89 ]. These measures are advocated to help in minimize hateful speech and its effects. We argue that analogous measures are possible in online communities as well, using the design affordances of the social media platforms (e.g., creating separate subreddits for minority communities in a college, or providing pop-up warnings on certain posts). However, both safe spaces and trigger warnings are critiqued as they are exclusionary and are harmful for open discourse in colleges. So, any such possible consequences should also be carefully evaluated before such measures are adopted in online communities of college students.

6.3. Ethical Implications

Amid the controversy surrounding the freedom of expression, defining (online) hateful speech remains a complex subject of ethical, legal, and administrative interest, especially on college campuses that are known to value inclusiveness in communities, and to facilitate progressive social exchange. While our definition of hateful speech in online college communities may not be universal, our measurement approach provides an objective understanding of the dynamics and impacts of hateful environment within online college communities. Nevertheless, any decision and policy making based on our findings requires careful and in-depth supplemental ethical analysis, beyond the empirical analysis we present in this paper. For instance, to what extent online hateful speech infringes on the speech provisions on specific campuses remains a topic that needs careful evaluation. Importantly, supported by our analysis, campus stakeholders must navigate a two-prong ethical dilemma: one around engaging with those who use online hateful speech, and two, around treating its extreme manifestations, like hate related threats and altercations directed at campus community members, or its interference with the institution’s educational goals.

We finally caution against our work being perceived as a means to facilitate surveillance of student speech on college campuses, or as a guideline to censor speech on campus. Our work is not intended to be used to intentionally or inadvertently marginalize or influence prejudice against those groups who are already marginalized (by gender, race, religion, sexual orientation etc.), or vulnerable, and are often the targets of hateful speech on campuses.

6.4. Limitations and Future Work

Our study includes limitations, and some of these suggest promising directions for future work. Although our work is grounded in prior work [ 7 ] that college subreddits are representative of the respective student bodies, we cannot claim that our results extrapolate directly to offline hateful speech on college campuses [ 15 ]. Similarly, we cannot claim that these results will be generalizable to general purpose or other online communities on Reddit or beyond, as well as with or without an offline analog like a college campus. Importantly, we did not assess the clinical nature of stress in our data, and focused only on inferred stress expression from social media language [ 75 ]; future work can validate the extent to which online hate speech impacts the mental health of students. Like many observational studies, we also cannot establish a true causality between an individual’s exposure to online hate and their stress expressions. To address these limitations, future work can gather ground truth data about individual stress experiences and clinically validate them with social media derived observations.

We note that our work is sensitive to the uniqueness of the Reddit platform, where the content is already moderated [ 20 , 47 , 62 ]. It is possible that the definition of hateful content qualifying for content removal could vary across the college subreddits, and our work is restricted to only the non-removed comments. Importantly, the norms and strategies to moderate content can vary across different college subreddits. Therefore, our study likely provides a “lower bound estimate” of hateful content on these communities. Additionally, users also use multiple accounts and throwaway accounts on Reddit [ 52 ], and we do not identify individual users’ experiences of online hate or stress in our dataset. Our findings about the psychological endurance to hate is interesting and inspires further theoretical and empirical investigations—e.g., how can we generalize the relationship between online hate and psychological wellbeing both on campuses and elsewhere, what factors influence the endurance of an individual, and how can we characterize endurance in terms of direct victimization or indirect influence of the ripple effects of online hateful speech on campuses.


In this paper, we first modeled College Hate Index (CHX) to measure the degree of hateful speech in college subreddits. We found that hateful speech does prevail in college subreddits. Then, we employed a causal inference framework to find that the exposure to hateful speech in college subreddits impacted greater stress expression of the community members. We also found that the exposed users showed varying psychological endurance to hate exposure, i.e, all users exposed to similar levels of hate reacted differently. We analyzed the language and personality of these low and high endurance users to find that, low endurance users are vulnerable to more emotional outbursts, and are more conscientious and neurotic than those showing higher endurance to hate.


• Human-centered computing → Empirical studies in collaborative and social computing; Social media .


We thank Eric Gilbert, Stevie Chancellor, Sindhu Ernala, Shagun Jhaver, and Benjamin Sugar for their feedback. Saha and De Choudhury were partly supported by research grants from Mozilla (RK677) and the NIH (R01GM112697).

Contributor Information

Koustuv Saha, Georgia Tech.

Eshwar Chandrasekharan, Georgia Tech.

Munmun De Choudhury, Georgia Tech.

Defining Hate Speech

Defining Hate Speech

Andy Sellars

Andy Sellars

There is no shortage of opinions about what should be done about hate speech, but if there is one point of agreement, it is that the topic is ripe for rigorous study. But just what is hate speech, and how will we know it when we see it online? For all of the extensive literature about the causes, harms, and responses to hate speech, few scholars have endeavored to systematically define the term. Where other areas of content analysis have developed rich methodologies to account for influences like context or bias, the present scholarship around hate speech rarely extends beyond identification of particular words or phrases that are likely to cause harm targeted toward immutable characteristics. 

This essay seeks to review some of the various attempts to define hate speech, and pull from them a series of traits that can be used to frame hate speech with a higher degree of confidence. In so doing, it explores the tensions between hate speech and principles of freedom of expression, both in the abstract and as they are captured in existing definitions. It also analyzes historical attempts to define the term in the United States, from the brief period of time when the United States punished hate speech directly. From this analysis, eight traits are surfaced that can be used for the development of a confidence scoring system to help ascertain whether a particular expression should be considered one of hate speech or not.

You might also like

  • community GOP reacts to Trump search with threats and comparisons to ‘Gestapo’
  • community Why Elon Musk’s Twitter might be (more) lethal
  • community Combating Hate Speech Through Counterspeech

Projects & Tools 01

Harmful speech online.

The Berkman Klein Center for Internet & Society is in the third year of a research, policy analysis, and network building effort devoted to the study of harmful speech, in close…

UN logo

Search the United Nations

What is hate speech.

  • Hate speech vs freedom of speech
  • Hate speech and real harm
  • Why tackle hate speech?
  • A pandemic of hate
  • The many challenges of tracking hate
  • Targets of hate
  • The preventive role of education
  • UN Strategy and Plan of Action
  • The role of the United Nations
  • International human rights law
  • Further UN initiatives
  • Teach and learn
  • Test yourself
  • International Day
  • Key human rights instruments
  • Thematic publications
  • Fact sheets [PDF]
  • UN offices, bodies, agencies, programmes

essay on online hate speech

Understanding hate speech

In common language, “hate speech” refers to offensive discourse targeting a group or an individual based on inherent characteristics (such as race, religion or gender) and that may threaten social peace.

To provide a unified framework for the United Nations to address the issue globally, the UN Strategy and Plan of Action on Hate Speech defines hate speech as… “ any kind of communication in speech, writing or behaviour, that attacks or uses pejorative or discriminatory language with reference to a person or a group on the basis of who they are , in other words, based on their religion, ethnicity, nationality, race, colour, descent, gender or other identity factor.”

However, to date there is no universal definition of hate speech under international human rights law. The concept is still under discussion, especially in relation to freedom of opinion and expression, non-discrimination and equality.

While the above is not a legal definition and is broader than “incitement to discrimination, hostility or violence” – which is prohibited under international human rights law -- it has three important attributes:

essay on online hate speech

It’s important to note that hate speech can only be directed at individuals or groups of individuals. It does not include communication about States and their offices, symbols or public officials, nor about religious leaders or tenets of faith.

Challenges raised by online hate speech

We must confront bigotry by working to tackle the hate that spreads like wildfire across the internet.” ANTÓNIO GUTERRES , United Nations Secretary-General, 2023

Secretary-General Portrait

“We must confront hatred wherever and whenever it rears its ugly head. This includes working to tackle hate speech that spreads like wildfire across the internet.”

— United Nations Secretary-General António Guterres, 2023

essay on online hate speech

Misinformation spreads faster when we’re upset. Pause and #TakeCareBeforeYouShare .

Hate Speech reminder

The growth of hateful content online has been coupled with the rise of easily shareable disinformation enabled by digital tools. This raises unprecedented challenges for our societies as governments struggle to enforce national laws in the virtual world's scale and speed.

Unlike in traditional media, online hate speech can be produced and shared easily, at low cost and anonymously. It has the potential to reach a global and diverse audience in real time. The relative permanence of hateful online content is also problematic, as it can resurface and (re)gain popularity over time.

Understanding and monitoring hate speech across diverse online communities and platforms is key to shaping new responses. But efforts are often stunted by the sheer scale of the phenomenon, the technological limitations of automated monitoring systems and the lack of transparency of online companies.

Meanwhile, the growing weaponization of social media to spread hateful and divisive narratives has been aided by online corporations’ algorithms. This has intensified the stigma vulnerable communities face and exposed the fragility of our democracies worldwide. It has raised scrutiny on Internet players and sparked questions about their role and responsibility in inflicting real world harm. As a result, some States have started holding Internet companies accountable for moderating and removing content considered to be against the law, raising concerns about limitations on freedom of speech and censorship.

Despite these challenges, the United Nations and many other actors are exploring ways of countering hate speech. These include initiatives to promote greater media and information literacy among online users while ensuring the right to freedom of expression.

Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities

  • Regular Contribution
  • Published: 25 September 2023
  • Volume 23 , pages 577–608, ( 2024 )

Cite this article

  • Anjum 1 &
  • Rahul Katarya   ORCID: orcid.org/0000-0001-7763-291X 1  

529 Accesses

Explore all metrics

Information and communication technology has evolved dramatically, and now the majority of people are using internet and sharing their opinion more openly, which has led to the creation, collection and circulation of hate speech over multiple platforms. The anonymity and movability given by these social media platforms allow people to hide themselves behind a screen and spread the hate effortlessly. Online hate speech (OHS) recognition can play a vital role in stopping such activities and can thus restore the position of public platforms as the open marketplace of ideas. To study hate speech detection in social media, we surveyed the related available datasets on the web-based platform. We further analyzed approximately 200 research papers indexed in the different journals from 2010 to 2022. The papers were divided into various sections and approaches used in OHS detection, i.e., feature selection, traditional machine learning (ML) and deep learning (DL). Based on the selected 111 papers, we found that 44 articles used traditional ML and 35 used DL-based approaches. We concluded that most authors used SVM, Naive Bayes, Decision Tree in ML and CNN, LSTM in the DL approach. This survey contributes by providing a systematic approach to help researchers identify a new research direction in online hate speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

essay on online hate speech

Data availability statements

Data generated or analyzed during this study are included in this published article.

https://hatespeechdata.com/ .

https://semeval.github.io/SemEval2021/tasks.html .

https://hasocfire.github.io/hasoc/2020/index.html .

https://swisstext-and-konvens-2020.org/shared-tasks/ .

https://sites.google.com/view/trac2/live?authuser=0 .

https://ai.Facebook.com/blog/hateful-memes-challenge-and-data-set/ .

Newman, N., Fletcher, R., Kalogeropoulos, A. et al.: Reuters Institute Digital News Report 2018 (2018)

Global social media ranking (2019). https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/

Diwhu, G., Ghdwk, W.K.H., Ihpdoh, R.I.D., Vwxghqw, X.: Automated detection of hate speech towards woman on Twitter. In: International Conference On Computer Science And Engineering. pp 7–10 (2018)

Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput Surv (2018). https://doi.org/10.1145/3232676

Article   Google Scholar  

bbc Facebook launches initiative to fight online hate speech. In: bbc. ps:// www.bbc.com/news/technology-40371869

Organisation International Alert (2016) A plugin to counter hate speech online. https://europeanjournalists.org/mediaagainsthate/hate-checker-plugin-to-counter-hate-speech-online/

Salminen, J., Guan, K., Jung, S.G. et al.: A literature review of quantitative persona creation. In: Conf Hum Factors Comput Syst - Proc 1–15 (2020). https://doi.org/10.1145/3313831.3376502

Biere, S., Analytics, M.B.: Hate speech detection using natural language processing techniques. VRIJE Univ AMSTERDAM 30 (2018)

DePaula, N., Fietkiewicz, K.J., Froehlich, T.J. et al.: Challenges for social media: misinformation, free speech, civic engagement, and data regulations. In: Proceedings of the Association for Information Science and Technology, pp. 665–668 (2018)

Varade, R.S., Pathak, V.: Detection of hate speech in hinglish language. In: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers) (2020)

Djuric, N., Zhou, J., Morris, R. et al.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web. Association for Computing Machinery, New York, NY, USA, pp. 29–30 (2015)

Davidson, T., Warmsley, D,. Macy, M., Weber, I.: Automated Hate Speech Detection and the Problem of Offensive Language. (2017). arXiv170304009v1 [csCL] 11 Mar 2017 Autom

Miró-Llinares, F., Moneva, A., Esteve, M.: Hate is in the air! But where? Introducing an algorithm to detect hate speech in digital microenvironments. Crime Sci. 7 , 1–12 (2018). https://doi.org/10.1186/s40163-018-0089-1

Daniel Burke The four reasons people commit hate crimes. In: CNN. https://edition.cnn.com/2017/06/02/us/who-commits-hate-crimes/index.html

Equality and Diversity Forum (2018) Hate Crime: Cause and effect | A research synthesis. Equal Divers Forum

ONTARIO PO, GENERAL MOA: CROWN POLICY MANUAL (2005). https://files.ontario.ca/books/crown_prosecution_manual_english_1.pdf

Räsänen, P., Hawdon, J., Holkeri, E., et al.: Targets of online hate: examining determinants of victimization among young finnish Facebook users. Violence Vict. 31 , 708–725 (2016)

Contributors, W.: Hate crime. In: Wikipedia (2020). https://en.wikipedia.org/wiki/Hate_crime

twitter Twitter policy against Hate speech. https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy

facebook Hate speech. https://www.facebook.com/communitystandards/hate_speech

Instagram Instagram policy for hate speech. https://help.instagram.com/477434105621119

Youtube YouTube hate policy. https://support.google.com/youtube/answer/2801939?hl=en

Dr. Amarendra Bhushan Dhiraj: Countries Where Cyber-bullying Was Reported The Most In 2018 (2018)

United nations: Universal Declaration of Human Rights (1948)

Nations S-G of the U: European Convention on Human Rights, the International Covenant on Civil and Political Rights (1966)

Gagliardone, I., Patel, A., Pohjonen, M.: Mapping and analysing hate speech online. In: SSRN Electronic Journal. p 41 (2015)

Schmidt, A., Wiegand, M.: A survey on hate speech detection using natural language processing. In: Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media, Pp. 1–10 (2017)

Nastiti, F.E., Prastyanti, R.A., Taruno, R.B., Hariyadi, D.: Social media warfare in Indonesia political campaign: a survey. In: Proceedings - 2018 3rd International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2018. IEEE, pp 49–53 (2019)

Kumar, A., Sachdeva, N.: Cyberbullying detection on social multimedia using soft computing techniques: a meta-analysis. Multimed. Tools Appl. (2019). https://doi.org/10.1007/s11042-019-7234-z

Waqas, A., Salminen, J., Jung, S., et al.: Mapping online hate: a scientometric analysis on research trends and hotspots in research on online hate. PLoS ONE 14 , 1–21 (2019). https://doi.org/10.1371/journal.pone.0222194

Waseem, Z., Hovy, D.: Hateful symbols or hateful people ? Predictive features for hate speech detection on Twitter. In: Association for Computational Linguistics Proceedings of NAACL-HLT. pp 88–93 (2016)

Vigna, F. Del, C. A., Orletta, F.D. et al.: Hate me , hate me not : Hate speech detection on Facebook. In: In Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy. pp 86–95 (2017)

Agarwal S, Sureka A (2017) But I did not mean it! - Intent classification of racist posts on tumblr. In: Proceedings - 2016 European Intelligence and Security Informatics Conference, EISIC 2016. IEEE, pp 124–127

CodaLab Competition. https://competitions.codalab.org/competitions/19935 .

Wang, G., Wang, B., Wang, T. et al: Whispers in the dark: Analysis of an anonymous social network. In: Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC. pp 137–149 (2014)

Ziai, A.: cohen kappa. In: Medium (2017). https://towardsdatascience.com/inter-rater-agreement-kappas-69cd8b91ff75

Gambäck. B,, Sikdar, U.K.: Using Convolutional Neural Networks to Classify Hate-Speech. In: Proceedings ofthe First Workshop on Abusive Language Online. pp 85–90 (2017)

Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep Learning for Hate Speech Detection in Tweets. In: arXiv:1706.00188v1 [cs.CL]. p 2 (2017)

Park, J.H., Fung, P.: One-step and two-step classification for abusive language detection on Twitter. In: Association for Computational Linguistics Proceedings of the First Workshop on Abusive Language Online, pages 41–45, Vancouver, Canada, July 30. pp 41–45 (2017)

Waseem, Z.: Are you a racist or am i seeing things ? Annotator influence on hate speech detection on Twitter. In: Proceedings of2016 EMNLP Workshop on Natural Language Processing and Computational Social Science. pp 138–142 (2016)

Jha, A: When does a Compliment become Sexist ? Analysis and Classification of Ambivalent Sexism using Twitter Data. In: Proceedings ofthe Second Workshop on Natural Language Processing. pp 7–16 (2017)

Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language ∗. In: arXiv (2017)

Alorainy, W., Burnap, P., Liu, H.A.N., Williams, M.L.: “ The Enemy Among Us ”: detecting cyber hate speech with threats-based othering language embeddings. ACM Trans. Web 13 (2019)

Nobata, C., Tetreault, J.: Abusive language detection in online user content. In: International World Wide Web Conference. Pp. 145–153 (2016)

Al, Z., Amr, M.: Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach. Springer Comput. (2019) https://doi.org/10.1007/s00607-019-00745-0

Detecting Insults in Social Commentary. https://www.kaggle.com/c/detecting-insults-in-social-commentary

MacAvaney, S., Yao, H.-R., Yang, E., Russell, K., Goharian, N.F.O. (2019) Hate speech detection: challenges and solutions. PLoS ONE 14(8): e0221152. https://doi.org/10.1371/journal.pone.0221152 . https://sites.google.com/view/trac1/shared-task

Timothy Quinn: Hatebase database. (2017). https://www.hatebase.org/

Charitidis, P., Doropoulos, S., Vologiannidis, S., et al.: Towards countering hate speech against journalists on social media. Online Soc. Netw. Media 17 , 10 (2020). https://doi.org/10.1016/j.osnem.2020.100071

Albadi, N., Kurdi, M., Mishra, S.: Are they our brothers? Analysis and detection of religious hate speech in the Arabic Twittersphere. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2018. IEEE, (pp. 69–76) (2018)

Al-Hassan, A., Al-Dossari, H.: Detection of hate speech in Arabic tweets using deep learning. Multimed. Syst. (2021). https://doi.org/10.1007/s00530-020-00742-w

Ousidhoum, N., Lin, Z., Zhang, H. et al.: Multilingual and multi-aspect hate speech analysis. EMNLP-IJCNLP 2019 - 2019 Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process Proc Conf 4675–4684 (2020). https://doi.org/10.18653/v1/d19-1474

Mulki, H., Haddad, H., Bechikh Ali, C., Alshabani, H.: L-HSAB: A Levantine Twitter dataset for hate speech and abusive language, pp. 111–118 (2019). https://doi.org/10.18653/v1/w19-3512

Ljubešić, N., Erjavec, T., Fišer, D.: Datasets of Slovene and Croatian moderated news comments, pp. 124–131 (2019). https://doi.org/10.18653/v1/w18-5116

Dinakar, K.: Modeling the detection of textual cyberbullying. In: 2011, Association for the Advancement of Artificial Intelligence, pp 11–17 (2011)

Greevy, E., Smeaton, A.F.: Classifying racist texts using a support vector machine. In: ACM Proceeding, pp 468–469 (2004)

Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6 , 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394

Rodriguez, A., Argueta, C., Chen, Y.L.: Automatic detection of hate speech on facebook using sentiment and emotion analysis. In: 1st International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2019. Pp. 169–174 (2019)

Hall, L.O., WPKNVCKWB,: snopes.com: Two-striped Telamonia Spider. J Artif Intell Res 2009 , 321–357 (2006). https://doi.org/10.1613/jair.953

Raufi, B., Xhaferri, I.: Application of machine learning techniques for hate speech detection in mobile applications. In: 2018 International Conference on Information Technologies, InfoTech 2018 - Proceedings. IEEE, pp 1–4 (2018)

Waseem, Z., Thorne, J., Bingel, J.: Bridging the gaps: multi task learning for domain transfer of hate speech detection. In: Online Harassment, Human–Computer Interaction Series, pp 29–55 (2018)

Lynn, T., Endo, P.T., Rosati, P., et al.: Data set for automatic detection of online misogynistic speech. Data Br. 26 , 104223 (2019). https://doi.org/10.1016/j.dib.2019.104223

Plaza-Del-Arco, F.-M., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Detecting Misogyny and Xenophobia in Spanish Tweets using language technologies. ACM Trans. Internet Technol. 20 , 1–19 (2020). https://doi.org/10.1145/3369869

Pelzer, B., Kaati, L., Akrami, N.: Directed digital hate. In: 2018 IEEE International Conference on Intelligence and Security Informatics, ISI 2018, pp. 205–210 (2018)

Martins, R., Gomes, M., Almeida, J.J. et al.: Hate speech classification in social media using emotional analysis. In: Proceedings - 2018 Brazilian Conference on Intelligent Systems, BRACIS 2018, pp. 61–66 (2018)

Basak, R., Sural, S., Ganguly, N., Ghosh, S.K.: Online public shaming on Twitter: detection, analysis, and mitigation. IEEE Trans. Comput. Soc. Syst. 6 , 208–220 (2019). https://doi.org/10.1109/TCSS.2019.2895734

Sreelakshmi, K., Premjith, B., Soman, K.P.: Detection of hate speech text in Hindi-English Code-mixed Data. Procedia Comput. Sci. 171 , 737–744 (2020). https://doi.org/10.1016/j.procs.2020.04.080

Andreou, A., Orphanou, K., Pallis, G.: MANDOLA : A Big-Data Processing and Visualization. ACM Trans. Internet Technol. 20 (2020)

Zimbra, D., Abbasi, A., Zeng, D., Chen, H.: The state-of-the-art in Twitter sentiment analysis. ACM Trans. Manag. Inf. Syst. 9 , 1–29 (2018). https://doi.org/10.1145/3185045

Mariconti, E., Suarez-Tangil, G., Blackburn, J., et al.: “You know what to do”: proactive detection of YouTube videos targeted by coordinated hate attacks. Proc ACM Hum.-Comput. Interact (2019). https://doi.org/10.1145/3359309

Gitari ND, Zuping Z, Damien H, Long J (2015) A Lexicon-based approach for hate speech detection a Lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquitous Eng. https://doi.org/10.14257/ijmue.2015.10.4.21

Lima, L., Reis, J.C.S., Melo, P. et al.: Inside the right-leaning echo chambers: characterizing gab, an unmoderated social system. In: Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ASONAM 2018. pp 515–522 (2018)

Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter : a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access, pp. 13825–13835 (2018)

Ruwandika, N.D.T., Weerasinghe, A.R.: Identification of hate speech in social media. In: 2018 International Conference on Advances in ICT for Emerging Regions (ICTer) : Identification. IEEE, pp. 273–278 (2018)

Alorainy W, Burnap P, Liu H, et al.: Suspended accounts : a source of tweets with disgust and anger emotions for augmenting hate speech data sample. In: Proceeding of the 2018 International Conference on Machine L̥earning and Cybernetics. IEEE (2018)

Setyadi, N.A., Nasrun, M., Setianingsih, C.: Text analysis for hate speech detection using backpropagation neural network. In: The 2018 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC). IEEE, pp 159–165 (2018)

Alfina, I., Mulia, R., Fanany, M.I., Ekanata, Y.: Hate speech detection in the Indonesian language: A dataset and preliminary study. In: 2017 International Conference on Advanced Computer Science and Information Systems, ICACSIS 2017. pp 233–237 (2018)

Sharma, H.K., Singh, T.P., Kshitiz, K., et al.: Detecting hate speech and insults on social commentary using NLP and machine learning. Int. J. Eng. Technol. Sci. Res. 4 , 279–285 (2017)

Google Scholar  

Sutejo, T.L., Lestari, D.P.: Indonesia hate speech detection using deep learning. In: International Conference on Asian Language Processing. IEEE, pp 39–43 (2018)

Lekea, I.K.: Detecting hate speech within the terrorist argument : a greek case. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, pp 1084–1091 (2018)

Liu, H., Burnap, P., Alorainy, W., Williams, M.L.: A fuzzy approach to text classification with two-stage training for ambiguous instances. IEEE Trans. Comput. Soc. Syst. 6 , 227–240 (2019). https://doi.org/10.1109/TCSS.2019.2892037

Wang, J., Zhou, W., Li, J., et al.: An online sockpuppet detection method based on subgraph similarity matching. In: Proceedings - 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11t. IEEE, pp. 391–398 (2019)

Wu, K., Yang, S., Zhu, K.Q.: False rumors detection on Sina Weibo by propagation structures. In: Proc - Int Conf Data Eng 2015-May:651–662 (2015). https://doi.org/10.1109/ICDE.2015.7113322

Saksesi, A.S., Nasrun, M., Setianingsih, C.: Analysis text of hate speech detection using recurrent neural network. In: The 2018 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC) Analysis. IEEE, pp. 242–248 (2018)

Sazany, E.: Deep learning-based implementation of hate speech identification on texts in Indonesian : Preliminary Study. In: 2018 International Conference on Applied Information Technology and Innovation (ICAITI) Deep. IEEE, pp 114–117 (2018)

Son, L.H., Kumar, A., Sangwan, S.R., et al.: Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 7 , 23319–23328 (2019). https://doi.org/10.1109/ACCESS.2019.2899260

Salminen, J., Hopf, M., Chowdhury, S.A., et al.: Developing an online hate classifier for multiple social media platforms. Human-centric Comput. Inf. Sci. 10 , 1–34 (2020). https://doi.org/10.1186/s13673-019-0205-6

Coste, R.L. (2000) Fighting speech with speech: David Duke, the anti-defamation league, online bookstores, and hate filters. In: Proceedings of the Hawaii International Conference on System Sciences. p 72

Gelber, K.: Terrorist-extremist speech and hate speech: understanding the similarities and differences. Ethical Theory Moral Pract. 22 , 607–622 (2019). https://doi.org/10.1007/s10677-019-10013-x

Zhang, Z.: Hate speech detection: a solved problem ? The challenging case of long tail on Twitter. Semant WEB IOS Press 1 , 1–5 (2018)

Hara, F.: Adding emotional factors to synthesized voices. In: Robot and Human Communication - Proceedings of the IEEE International Workshop, Pp. 344–351 (1997)

Fatahillah, N.R., Suryati, P., Haryawan, C.: Implementation of Naive Bayes classifier algorithm on social media (Twitter) to the teaching of Indonesian hate speech. In: Proceedings—2017 International Conference on Sustainable Information Engineering and Technology, SIET 2017, pp. 128–131 (2018)

Ahmad Niam, I.M., Irawan, B., Setianingsih, C., Putra, B.P.: Hate speech detection using latent semantic analysis (LSA) method based on image. In: Proceedings - 2018 International Conference on Control, Electronics, Renewable Energy and Communications, ICCEREC 2018. IEEE, pp. 166–171 (2019)

Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimed. Ubiquitous Eng. 10 , 215–230 (2015)

Chen, Y., Zhou, Y., Zhu, S., Xu, H.: Detecting offensive language in social media to protect adolescent online safety. In: Proceedings - 2012 ASE/IEEE International Conference on Privacy, Security, Risk and Trust and 2012 ASE/IEEE International Conference on Social Computing, SocialCom/PASSAT 2012. IEEE, pp. 71–80 (2012)

Pitsilis, G.K., Ramampiaro, H., Langseth, H.: Effective hate-speech detection in Twitter data using recurrent neural networks. Appl. Intell., Pp. 4730–4742 (2018)

Pitsilis, G.K., Ramampiaro, H., Langseth, H.: Detecting offensive language in Tweets using deep learning (2018). arXiv:180104433v1 1–17. https://doi.org/10.1007/s10489-018-1242-y

Warner, W., Hirschberg, J.: Detecting hate speech on the World Wide Web. In: Association for Computational Linguistics Proceedings of the 2012 Workshop on Language in Social Media (LSM 2012), pp. 19–26 (2012)

Dinakar, K., Jones, B., Havasi, C., Lieberman, H.: Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Trans. Interact. Intell. Syst. 2 , 30 (2012). https://doi.org/10.1145/2362394.2362400

Burnap, P., Williams, M.L.: Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making. Policy Internet 7 , 223–242 (2015). https://doi.org/10.1002/poi3.85

Garc, A: Hate speech dataset from a white supremacy forum. In: Proceedings of the Second Workshop on Abusive Language Online, pp. 11–20 (2018)

Ombui, E., Karani, M., Muchemi, L.: Annotation framework for hate speech identification in Tweets : Case Study of Tweets During Kenyan Elections. In: 2019 IST-Africa Week Conference (IST-Africa). IST-Africa Institute and Authors, pp. 1–9 (2019)

Hosseinmardi, H., Mattson, S.A., Rafiq, R.I. et al.: Detection of cyberbullying incidents on the Instagram Social Network. In: arXiv:1503.03909v1 [cs.SI] 12 Mar 2015 Abstract (2015)

Raufi, B., Xhaferri, I.: Application of machine learning techniques for hate speech detection in mobile applications. In: 2018 International Conference on Information Technologies (InfoTech-2018), IEEE Conference Rec. No. 46116 20–21 September 2018, St. St. Constantine and Elena, Bulgaria. IEEE (2018)

Warner, W., Hirschberg, J.: Detecting hate speech on the World Wide Web. In: 19 Proceedings of the 2012 Workshop on Language in Social Media (LSM. pp 19–26) (2012)

Wang, G., Wang, B., Wang, T. et al.: Whispers in the dark : analysis of an anonymous social network categories and subject descriptors. ACM 13 (2014)

Mathew, B., Saha, P., Yimam, S.M. et al.: HateXplain: a benchmark dataset for explainable hate speech detection. In: ACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). p 12 (2020)

Kiilu, K.K., Okeyo, G., Rimiru, R., Ogada, K.: Using Naïve Bayes Algorithm in detection of Hate Tweets. Int. J. Sci. Res. Publ. 8:99–107. https://doi.org/10.29322/ijsrp.8.3.2018.p7517 (2018)

Sanchez, H.: Twitter Bullying Detection, pp. 1–7 (2016). In: https://www.researchgate.net/publication/267823748

Gröndahl, T., Pajola, L., Juuti, M. et al.: All you need is “love”: Evading hate speech detection. In: Proceedings of the ACM Conference on Computer and Communications Security. pp 2–12 (2018)s

Correa, D., Silva, L.A., Mondal, M., et al.: The many shades of anonymity : characterizing anonymous social media content. Assoc Adv. Artif. Intell. 10 (2015)

Paetzold, G.H., Malmasi, S., Zampieri, M.: UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks. In: arXiv:1904.07839v1 . p 5 (2019)

Miro-Llinares, F., Rodriguez-Sala, J.J.: Cyber hate speech on twitter: analyzing disruptive events from social media to build a violent communication and hate speech taxonomy. Int. J. Design Nat. Ecodyn. pp 406–415 (2016)

Rizoiu, M.-A., Wang, T., Ferraro, G., Suominen, H.: Transfer learning for hate speech detection in social media. arXiv:190603829v1 (2019)

Pitsilis, G.K., Ramampiaro, H., Langseth, H.: Effective hate-speech detection in Twitter data using recurrent neural networks. Appl. Intell. 48 , 4730–4742 (2018). https://doi.org/10.1007/s10489-018-1242-y

Varade, R.S., Pathak, V.B.: Detection of hate speech in hinglish language. Adv. Intell. Syst. Comput. 1101 , 265–276 (2020). https://doi.org/10.1007/978-981-15-1884-3_25

Modha, S., Majumder, P., Mandl, T., Mandalia, C.: For surveillance detecting and visualizing hate speech in social media: a cyber watchdog for surveillance. Expert Syst. Appl. (2020). https://doi.org/10.1016/j.eswa.2020.113725

Maxime: What is a Transformer?No Title. In: Medium (2019). https://medium.com/inside-machine-learning/what-is-a-transformer-d07dd1fbec04

Horev R BERT Explained: State of the art language model for NLP Title. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

Mozafari, M., Farahbakhsh, R., Crespi, N.: A BERT-based transfer learning approach for hate speech detection in online social media. Stud. Comput. Intell. 881 SCI:928–940 (2020). https://doi.org/10.1007/978-3-030-36687-2_77

Mutanga, R.T., Naicker, N., Olugbara, O.O. (2020) Hate speech detection in twitter using transformer methods. Int. J. Adv. Comput. Sci. Appl.; 11, 614–620 . https://doi.org/10.14569/IJACSA.2020.0110972

Plaza-del-Arco, F.M., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166 (2021)

Pandey, P.: Deep generative models. In: medium. https://towardsdatascience.com/deep-generative-models-25ab2821afd3

Wullach, T., Adler, A., Minkov, E.M.: Towards hate speech detection at large via deep generative modeling. IEEE Internet Comput. (2020). https://doi.org/10.1109/MIC.2020.3033161

Dugas, D., Nieto, J., Siegwart, R., Chung, J.J.: NavRep : Unsupervised representations for reinforcement learning of robot navigation in dynamic human environments (2021)

Behzadi, M., Harris, I.G., Derakhshan, A.: Rapid cyber-bullying detection method using compact BERT models. In: Proc - 2021 IEEE 15th Int Conf Semant Comput ICSC 2021 199–202. (2021) https://doi.org/10.1109/ICSC50631.2021.00042

Araque, O., Iglesias, C.A.: An ensemble method for radicalization and hate speech detection online empowered by sentic computing. Cognit. Comput. (2021). https://doi.org/10.1007/s12559-021-09845-6

Plaza-del-Arco, F.M., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166 , 114120 (2021). https://doi.org/10.1016/j.eswa.2020.114120

Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in tweets. In: 26th International World Wide Web Conference 2017, WWW 2017 Companion (2019)

Mossie, Z., Wang, J.H.: Vulnerable community identification using hate speech detection on social media. Inf. Process Manag. 57 , 102087 (2020). https://doi.org/10.1016/j.ipm.2019.102087

Magu, R., Joshi, K., Luo, J.: Detecting the hate code on social media. In: Proceedings of the 11th International Conference on Web and Social Media, ICWSM 2017. pp 608–611 (2017)

Qian, J., Bethke, A., Liu, Y., et al.: A benchmark dataset for learning to intervene in online hate speech. In: EMNLP-IJCNLP 2019 - 2019 Conf Empir Methods Nat Lang Process 9th Int Jt Conf Nat Lang Process Proc Conf 4755–4764 (2020). https://doi.org/10.18653/v1/d19-1482

Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21 , 1–13 (2020). https://doi.org/10.1186/s12864-019-6413-7

Lee, K., Ram, S.: PERSONA: Personality-based deep learning for detecting hate speech. In: International Conference on Information Systems, ICIS 2020 - Making Digital Inclusive: Blending the Local and the Global. Association for Information Systems (2021)

Download references

Author information

Authors and affiliations.

Big Data Analytics and Web Intelligence Laboratory, Department of Computer Science and Engineering, Delhi Technological University, New Delhi, India

Anjum & Rahul Katarya

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Rahul Katarya .

Ethics declarations

Conflict of interest.

All the authors of this paper declare that he/she has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Anjum, Katarya, R. Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities. Int. J. Inf. Secur. 23 , 577–608 (2024). https://doi.org/10.1007/s10207-023-00755-2

Download citation

Accepted : 02 September 2023

Published : 25 September 2023

Issue Date : February 2024

DOI : https://doi.org/10.1007/s10207-023-00755-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep learning
  • Natural language processing (NLP)
  • Machine learning
  • Online hate speech (OHS)
  • Social media
  • Toxicity detection
  • Find a journal
  • Publish with us
  • Track your research

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access


Research Article

Hate speech detection: Challenges and solutions

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Writing – original draft, Writing – review & editing

* E-mail: [email protected]

Affiliation Information Retrieval Laboratory, Georgetown University, Washington, DC, United States of America

ORCID logo

Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft, Writing – review & editing

Roles Conceptualization, Writing – original draft, Writing – review & editing

Roles Conceptualization, Methodology, Supervision, Writing – review & editing

  • Sean MacAvaney, 
  • Hao-Ren Yao, 
  • Eugene Yang, 
  • Katina Russell, 
  • Nazli Goharian, 
  • Ophir Frieder


  • Published: August 20, 2019
  • https://doi.org/10.1371/journal.pone.0221152
  • Reader Comments

Table 1

As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent approaches suffer from an interpretability problem—that is, it can be difficult to understand why the systems make the decisions that they do. We propose a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods. We also discuss both technical and practical challenges that remain for this task.

Citation: MacAvaney S, Yao H-R, Yang E, Russell K, Goharian N, Frieder O (2019) Hate speech detection: Challenges and solutions. PLoS ONE 14(8): e0221152. https://doi.org/10.1371/journal.pone.0221152

Editor: Minlie Huang, Tsinghua University, CHINA

Received: April 3, 2019; Accepted: July 22, 2019; Published: August 20, 2019

Copyright: © 2019 MacAvaney et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript, its Supporting Information files, and the provided data links as follows. Forum dataset: https://github.com/aitor-garcia-p/hate-speech-dataset . Instructions to get TRAC dataset: https://sites.google.com/view/trac1/shared-task . HatebaseTwitter dataset: https://github.com/t-davidson/hate-speech-and-offensive-language . HatEval dataset: https://competitions.codalab.org/competitions/19935 .

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.


Hate crimes are unfortunately nothing new in society. However, social media and other means of online communication have begun playing a larger role in hate crimes. For instance, suspects in several recent hate-related terror attacks had an extensive social media history of hate-related posts, suggesting that social media contributes to their radicalization [ 1 , 2 ]. In some cases, social media can play an even more direct role; video footage from the suspect of the 2019 terror attack in Christchurch, New Zealand, was broadcast live on Facebook [ 2 ].

Vast online communication forums, including social media, enable users to express themselves freely, at times, anonymously. While the ability to freely express oneself is a human right that should be cherished, inducing and spreading hate towards another group is an abuse of this liberty. For instance, The American Bar Association asserts that in the United States, hate speech is legal and protected by the First Amendment, although not if it directly calls for violence [ 3 ]. As such, many online forums such as Facebook, YouTube, and Twitter consider hate speech harmful, and have policies to remove hate speech content [ 4 – 6 ]. Due to the societal concern and how widespread hate speech is becoming on the Internet [ 7 ], there is strong motivation to study automatic detection of hate speech. By automating its detection, the spread of hateful content can be reduced.

Detecting hate speech is a challenging task, however. First, there are disagreements in how hate speech should be defined. This means that some content can be considered hate speech to some and not to others, based on their respective definitions. We start by covering competing definitions, focusing on the different aspects that contribute to hate speech. We are by no means, nor can we be, comprehensive as new definitions appear regularly. Our aim is simply to illustrate variances highlighting difficulties that arise from such.

Competing definitions provide challenges for evaluation of hate speech detection systems; existing datasets differ in their definition of hate speech, leading to datastets that are not only from different sources, but also capture different information. This can make it difficult to directly access which aspects of hate speech to identify. We discuss the various datasets available to train and measure the performance of hate speech detection systems in the next section. Nuance and subtleties in language provide further challenges in automatic hate speech identification, again depending on the definition.

Despite differences, some recent approaches found promising results for detecting hate speech in textual content [ 8 – 10 ]. The proposed solutions employ machine learning techniques to classify text as hate speech. One limitation of these approaches is that the decisions they make can be opaque and difficult for humans to interpret why the decision was made. This is a practical concern because systems that automatically censor a person’s speech likely need a manual appeal process. To address this problem, we propose a new hate speech classification approach that allows for a better understanding of the decisions and show that it can even outperform existing approaches on some datasets. Some of the existing approaches use external sources, such as a hate speech lexicon, in their systems. This can be effective, but it requires maintaining these sources and keeping them up to date which is a problem in itself. Here, our approach does not rely on external resources and achieves reasonable accuracy. We cover these topics in the following section.

In general, however, there are practical challenges that remain among all systems. For instance, armed with the knowledge that the platforms they use are trying to silence them, those seeking to spread hateful content actively try to find ways to circumvent measures put in place. We cover this topic in more detail in the last section.

In summary, we discuss the challenges and approaches in automatic detection of hate speech, including competing definitions, dataset availability and construction, and existing approaches. We also propose a new approach that in some cases outperforms the state of the art and discuss remaining shortcomings. Ultimately, we conclude the following:

  • Automatic hate speech detection is technically difficult;
  • Some approaches achieve reasonable performance;
  • Specific challenges remain among all solutions;
  • Without societal context, systems cannot generalize sufficiently.

Defining hate speech

The definition of hate speech is neither universally accepted nor are individual facets of the definition fully agreed upon. Ross, et al. believe that a clear definition of hate speech can help the study of detecting hate speech by making annotating hate speech an easier task, and thus, making the annotations more reliable [ 11 ]. However, the line between hate speech and appropriate free expression is blurry, making some wary to give hate speech a precise definition. For instance, the American Bar Association does not give an official definition, but instead asserts that speech that contributes to a criminal act can be punished as part of a hate crime [ 12 ]. Similarly, we opt not to propose a specific definition, but instead examine existing definitions to gain insights into what typically constitutes hate speech and what technical challenges the definitions might bring. We summarize leading definitions of hate speech from varying sources, as well as some aspects of the definitions that make the detection of hate speech difficult.

  • Encyclopedia of the American Constitution: “Hate speech is speech that attacks a person or group on the basis of attributes such as race, religion, ethnic origin, national origin, sex, disability, sexual orientation, or gender identity.” [ 13 ]
  • Facebook: “We define hate speech as a direct attack on people based on what we call protected characteristics—race, ethnicity, national origin, religious affiliation, sexual orientation, caste, sex, gender, gender identity, and serious disease or disability. We also provide some protections for immigration status. We define attack as violent or dehumanizing speech, statements of inferiority, or calls for exclusion or segregation.” [ 4 ]
  • Twitter: “Hateful conduct: You may not promote violence against or directly attack or threaten other people on the basis of race, ethnicity, national origin, sexual orientation, gender, gender identity, religious affiliation, age, disability, or serious disease.” [ 6 ]
  • Davidson et al.: “Language that is used to expresses hatred towards a targeted group or is intended to be derogatory, to humiliate, or to insult the members of the group.” [ 9 ]
  • de Gilbert et al.: “Hate speech is a deliberate attack directed towards a specific group of people motivated by aspects of the group’s identity.” [ 14 ]
  • Fortuna et al. “Hate speech is language that attacks or diminishes, that incites violence or hate against groups, based on specific characteristics such as physical appearance, religion, descent, national or ethnic origin, sexual orientation, gender identity or other, and it can occur with different linguistic styles, even in subtle forms or when humour is used.” [ 8 ]. This definition is based on their analysis of various definitions.

It is notable that in some of the definitions above, a necessary condition is that it is directed to a group. This differs from the Encyclopedia of the American Constitution definition, where an attack on an individual can be considered hate speech. A common theme among the definitions is that the attack is based on some aspect of the group or peoples identity. While in de Gilbert’s definition the identity itself is left vague, some of the other definitions provide specific identity characteristics. In particular, protected characteristics are aspects of the Davidson et al. and Facebook definitions. Fortuna et al.’s definition specifically calls out variations in language style and subtleties. This can be challenging, and goes beyond what conventional text-based classification approaches are able to capture.

Fortuna et al.’s definition is based on an analysis of the following characteristics from other definitions [ 8 ]:

  • Hate speech is to incite violence or hate
  • Hate speech is to attack or diminish
  • Hate speech has specific targets
  • Whether humor can be considered hate speech

A particular problem not covered by many definitions relate to factual statements. For example, “Jews are swine” is clearly hate speech by most definitions (it is a statement of inferiority), but “Many Jews are lawyers” is not. In the latter case, to determine whether each statement is hate speech, we would need to check whether the statement is factual or not using external sources. This type of hate speech is difficult because it relates to real-world fact verification—another difficult task [ 15 ]. More so, to evaluate validity, we would initially need to define precise word interpretations, namely, is “many” an absolute number or by relative percentage of the population, further complicating the verification.

Another issue that arises in the definition of hate speech is the potential praising of a group that is hateful. For example, praising the KKK is hate speech, however praising another group can clearly be non-hate speech. In this case it is important to know what groups are hate groups and what exactly is being praised about the group as some praising is undoubtedly, and unfortunately, true. For example, the Nazis were very efficient in terms of their “Final Solution”. Thus, praise processing alone is, at times, difficult.

Collecting and annotating data for the training of automatic classifiers to detect hate speech is challenging. Specifically, identifying and agreeing whether specific text is hate speech is difficult, as per previously mentioned, there is no universal definition of hate speech. Ross, et al. studied the reliability of hate speech annotations and suggest that annotators are unreliable [ 11 ]. Agreement between annotators, measured using Krippendorff’s α , was very low (up to 0.29). However, they compared annotations based on the Twitter definition, versus annotations based on their own opinions and found a strong correlation.

Furthermore, social media platforms are a hotbed for hate speech, yet many have very strict data usage and distribution policies. This results in a relatively small number of datasets available to the public to study, with most coming from Twitter (which has a more lenient data usage policy). While the Twitter resources are valuable, their general applicability is limited due to the unique genre of Twitter posts; the character limitation results in terse, short-form text. In contrast, posts from other platforms are typically longer and can be part of a larger discussion on a specific topic. This provides additional context that can affect the meaning of the text.

Another challenge is that there simply are not many publicly-available, curated datasets that identify hateful, aggressive, and insulting text. A representative sampling of available training and evaluation public datasets is shown in Table 1 :

  • HatebaseTwitter [ 9 ]. One Twitter dataset is a set of 24,802 tweets provided by Davidson, et al [ 9 ]. Their procedure for creating the dataset was as follows. First they took a hate speech lexicon from Hatebase [ 16 ] and searched for tweets containing these terms, resulting in a set of tweets from about 33,000 users. Next they took a timeline from all these users resulting in a set of roughly 85 million Tweets. From the set of about 85 million tweets, they took a random sample, of 25k tweets, that contained terms from the lexicon. Via crowdsourcing, they annotated each tweet as hate speech, offensive (but not hate speech), or neither hate speech nor offensive. If the agreement between annotators was too low, the tweet was excluded from the set. A commonly-used subset of this dataset is also available, containing 14,510 tweets.
  • WaseemA [ 17 ]. Waseem and Hovy also provide a dataset from Twitter, consisting of 16,914 tweets labeled as racist, sexist, or neither [ 17 ]. They first created a corpus of about 136,000 tweets that contain slurs and terms related to religious, sexual, gender, and ethnic minorities. From this corpus, the authors themselves annotated (labeled) 16,914 tweets and had a gender studies major review the annotations.
  • WaseemB [ 18 ]. In a second paper, Waseem creates another dataset by sampling a new set of tweets from the 136,000 tweet corpus [ 18 ]. In this collection, Waseem recruited feminists and anti-racism activists along with crowdsourcing for the annotation of the tweets. The labels therein are racist, sexist, neither or both.
  • Stormfront [ 14 ]. de Gilbert, et al. provide a dataset from posts from a white supremacist forum, Stormfront [ 14 ]. They annotate the posts at sentence level resulting in 10,568 sentences labeled with Hate, NoHate, Relation, or Skip. Hate and NoHate labels indicate presence or lack thereof, respectively, of hate speech in each sentence. The label “Relation” indicates that the sentence is hate speech when it is combined with the sentences around it. Finally, the label “skip” is for sentences that are non-English or not containing information related to hate or non-hate speech. They also capture the amount of context (i.e., previous sentences) that an annotator used to classify the text.
  • TRAC [ 19 ]. The 2018 Workshop on Trolling, Aggression, and Cyberbullying (TRAC) hosted a shared task focused on detecting aggressive text in both English and Hindi [ 19 ]. Aggressive text is often a component of hate speech. The dataset from this task is available to the public and contains 15,869 Facebook comments labeled as overtly aggressive, covertly aggressive, or non-aggressive. There is also a small Twitter dataset, consisting of 1,253 tweets, which has the same labels.
  • HatEval [ 20 ]. This dataset is from SemEval 2019 (Task 5) for competition on multilingual detection of hate targeting to women and immigrants in tweets [ 20 ]. It consists of several sets of labels. The first indicates whether the tweet expresses hate towards women or immigrants, the second, whether the tweet is aggressive, and the third, whether the tweet is directed at an individual or an entire group. Note that targeting an individual is not necessarily considered hate speech by all definitions.
  • Kaggle [ 21 ] Kaggle.com hosted a shared task on detecting insulting comments [ 21 ]. The dataset consists of 8,832 social media comments labeled as insulting or not insulting. While not necessarily hate speech, insulting text may indicate hate speech.
  • GermanTwitter [ 11 ]. As part of their study of annotator reliability, Ross, et al. created a Twitter dataset in German for the European refugee crisis [ 11 ]. It consists of 541 tweets in German, labeled as expressing hate or not.


  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image


Note that these datasets vary considerably in their size, scope, characteristics of the data annotated, and characteristics of hate speech considered. The most common source of text is Twitter, which consists of short-form online posts. While the Twitter datasets do capture a wide variety of hate speech aspects in several different languages such as attacking different groups, the construction process including the filtering and sampling methods introduce uncontrolled factors for analyzing the corpora. Furthermore, corpora constructed from social media and websites other than Twitter are rare, making analysis of hate speech difficult to cover the entire landscape.

There is also the issue of imbalance in the number of hate and not hate texts within datasets. On a platform such as Twitter, hate speech occurs at a very low rate compared to non-hate speech. Although datasets reflect this imbalance to an extent, they do not map the actual percentage due to training needs. For example, in the WaseemA dataset [ 17 ], 20% of the tweets were labelled sexist, 11.7% racist, and 68.3% neither. In this case, there is still an imbalance in the number of sexist, racist, or neither tweets, but it may not be as imbalanced as expected on Twitter.

Automatic approaches for hate speech detection

Most social media platforms have established user rules that prohibit hate speech; enforcing these rules, however, requires copious manual labor to review every report. Some platforms, such as Facebook, recently increased the number of content moderators. Automatic tools and approaches could accelerate the reviewing process or allocate the human resource to the posts that require close human examination. In this section, we overview automatic approaches for hate speech detection from text.

Keyword-based approaches

A basic approach for identifying hate speech is using a keyword-based approach. By using an ontology or dictionary, text that contain potentially hateful keywords are identified. For instance, Hatebase [ 16 ] maintains a database of derogatory terms for many groups across 95 languages. Such well-maintained resources are valuable, as terminology changes over time. However, as we observed in our study of the definitions of hate speech, simply using a hateful slur is not necessarily enough to constitute hate speech.

Keyword-based approaches are fast and straightforward to understand. However, they have severe limitations. Detecting only racial slurs would result in a highly precise system but with low recall where precision is the percentage of relevant from the set detected and recall is the percent of relevant from within the global population. In other words, a system that relies chiefly on keywords would not identify hateful content that does not use these terms. In contrast, including terms that could but are not always hateful (e.g., “trash”, “swine”, etc.) would create too many false alarms, increasing recall at the expense of precision.

Furthermore, keyword-based approaches cannot identify hate speech that does not have any hateful keywords (e.g., figurative or nuanced language). Slang such as “build that wall” literally means constructing a physical barrier (wall). However, with the political context, some interpret this is a condemnation of some immigrates in the United States.

Source metadata

Additional information from social media can help further understand the characteristics of the posts and potentially lead to a better identification approach. Information such as demographics of the posting user, location, timestamp, or even social engagement on the platform can all give further understanding of the post in different granularity.

However, this information is not often readily available to external researchers as publishing data with sensitive user information raises privacy issues. External researchers might only have part or even none of the user information. Thus, they possibly solve the wrong puzzle or learn based on wrong knowledge from the data. For instance, a system trained on these data might naturally bias towards flagging content by certain users or groups as hate speech based on incidental dataset characteristics.

Using user information potentially raises some ethical issues. Models or systems might be biased against certain users and frequently flag their posts as hateful even if some of them are not. Similarly, relying too much on demographic information could miss posts from users who do not typically post hateful content. Flagging posts as hate based on user statistics could create a chilling effect on the platform and eventually limit freedom of speech.

Machine learning classifiers

Machine learning models take samples of labeled text to produce a classifier that is able to detect the hate speech based on labels annotated by content reviewers. Various models were proposed and proved successful in the past. We describe a selection of open-sourced systems presented in the recent research.

Content preprocessing and feature selection.

To identify or classify user-generated content, text features indicating hate must be extracted. Obvious features are individual words or phrases (n-grams, i.e., sequence of n consecutive words). To improve the matching of features, words can be stemmed to obtain only the root removing morphological differences. Metaphore processing, e.g., Neuman, et. al. [ 22 ], likewise can extract features.

The bag-of-words assumption is commonly used in text categorization. Under this assumption, a post is represented simply as a set of words or n-grams without any ordering. This assumption certainly omits an important aspect of languages but nevertheless proved powerful in numerous tasks. In this setting, there are various ways to assign weights to the terms that may be more important, such as TF-IDF [ 23 ]. For a general information retrieval review, see [ 24 ].

Besides distributional features, word embeddings, i.e., assigning a vector to a word, such as word2vec [ 25 ], are common when applying deep learning methods in natural language processing and text mining [ 26 , 27 ]. Some deep learning architectures, such as recurrent and transformer neural networks, challenge the bag-of-words assumption by modeling the ordering of the words by processing over a sequence of word embeddings [ 28 ].

Hate speech detection approaches and baselines.

Naïve Bayes, Support Vector Machine and Logistic Regression . These models are commonly used in text categorization. Naïve Bayes models label probabilities directly with the assumption that the features do not interact with one another. Support Vector Machines (SVM) and Logistic Regression are linear classifiers that predict classes based on a combination of scores for each feature. Open-source implementations of the these models exist, for instance in the well-known Python machine learning package sci-kit learn [ 29 ].

Davidson, et al . [ 9 ] Davidson, et al. proposed a state-of-the-art feature-based classification model that incorporates distributional TF-IDF features, part-of-speech tags, and other linguistic features using support vector machines. The incorporation of these linguistic features helps identify hate speech by distinguishing between different usages of the terms, but still suffers from some subtleties, such as when typically offensive terms are used in a positive sense (e.g., queer in “He’s a damn good actor. As a gay man, it’s awesome to see an openly queer actor given the lead role for a major film.”, from HatebaseTwitter dataset [ 9 ]).

Neural Ensemble [ 10 ]. Zimmerman, et al. propose an ensemble approach, which combines the decisions of ten convolutional neural networks with different weight initializations [ 10 ]. Their network structure is similar to the one proposed by [ 30 ], with convolutions of length 3 pooled over the entire document length. The results of each model are combined by averaging the scores, akin to [ 31 ].

FastText [ 32 ]. FastText is an efficient classification model proposed by researchers in Facebook. The model produces embeddings of character n-grams and provides predictions of the example based on the embeddings. Over time, this model has become a strong baseline for many text categorization tasks.

BERT [ 26 ]. BERT is a recent transformer-based pre-trained contextualized embedding model extendable to a classification model with an additional output layer. It achieves state-of-the-art performance in text classification, question answering, and language inference without substantial task-specific modifications. When we experiment with BERT, we add a linear layer atop the classification token (as suggested by [ 26 ]), and test all suggested tuning hyperparameters.

C-GRU [ 33 ]. C-GRU, a Convolution-GRU Based Deep Neural Network proposed by Zhang, et al., combines convolutional neural networks (CNN) and gated recurrent networks (GRU) to detect hate speech on Twitter. They conduct several evaluations on publicly available Twitter datasets demonstrating their ability to capture word sequence and order in short text. Note, in the HatebaseTwitter [ 9 ] dataset, they treat both Hate and Offensive as Hate resulting in binary label instead of its original multi-class label. In our evaluation, we use the original multi-class labels where different model evaluation results are expected.

Our proposed classifier: Multi-view SVM

We propose a multi-view SVM model for the classification of hate speech. It applies a multiple-view stacked Support Vector Machine (mSVM) [ 34 ]. Each type of feature (e.g., a word TF-IDF unigram) is fitted with an individual Linear SVM classifier (inverse regularization constant C = 0.1), creating a view-classifier for those features. We further combine the view classifiers with another Linear SVM ( C = 0.1) to produce a meta-classifier . The features used in the meta-classifier are the predicted probability of each label by each view-classifier. That is, if we have 5 types of features (e.g., character unigram to 5-gram) and 2 classes of labels, 10 features would serve as input into the meta-classifier.

Combining machine learning classifiers is not a new concept [ 35 ]. Previous efforts have shown that combining SVM with different classifiers provides improvements to various data mining tasks and text classification [ 36 , 37 ]. Combining multiple SVMs (mSVMs) has also been proven to be an effective approach in image processing tasks for reducing the large dimensionality problem [ 38 ].

However, applying multiple SVMs to identify hate speech expands the domain of use for such classification beyond that previously explored. Multi-view learning is known for capturing different views of the data [ 34 ]. In the context of hate speech detection, incorporating different views captures differing aspects of hate speech within the classification process. Instead of combining all features into a single feature vector, each view-classifier learns to classify the sentence based on only one type of feature. This allows the view-classifiers to pick up different aspects of the pattern individually.

Integrating all feature types in one model, by regularization, risks the masking of relatively weak but key signals. For example, “yellow” and “people” individually would appear more times than “yellow people” combined; posts having these terms individually are unlikely to be hate. However, “yellow people” is likely hate speech (especially when other hate speech aspects are present), but the signal might be rare in the collection, and therefore, is likely masked by the regularization if all features are combined together. In this case, mSVM is able to pick up this feature in one of the view-classifiers, where there are fewer parameters.

Furthermore, this model offers the opportunity to interpret the model so as to identify which view-classifier contributes most through the meta-classifier provides human intuition for the classification. The view-classifier contributing most to the final decision identifies key vocabulary (features) resulting in a hate speech label. This contrasts with well-performing neural models, which are often opaque and difficult to understand [ 10 , 39 , 40 ]. Even state-of-the-art methods that employ self-attention (e.g., BERT [ 26 ]) suffer from considerable noise that vastly reduces interpretability.

Experimental setup

Using multiple hate speech datasets, we evaluated the accuracy of existing as well as our hate speech detection approaches.

Data preprocessing and features.

For simplicity and generality, preprocessing and feature identification is intentionally minimal. For pre-processing, we apply case-folding, tokenization, and punctuation removal (while keeping emoji). For features, we simply extract word TF-IDF from unigram to 5-gram and character N-gram counts from unigram to 5-gram.

We evaluate the approach on the Stormfront [ 14 ], TRAC [ 19 ], HatEval, and HatebaseTwitter [ 9 ] datasets previously described. These datasets provide a variety of hate speech definitions and aspects (including multiple types of aggression), and multiple types of online content (including online forums, Facebook, and Twitter content). For Stormfront, we use the balanced train/test split proposed in [ 14 ], with a random selection of 10% of the training set held out as validation data. For the TRAC dataset, we use the English Facebook training, validation, and test splits provided by [ 19 ]. For HatEval, we use a split of the training set for validation and use the official validation dataset for testing because the official test set is not public. Finally, for the HatebaseTwitter dataset [ 9 ], we use the standard train-validation-test split provided by [ 9 ].


We evaluate the performance of each approach using accuracy and macro-averaged F 1 score. There are not a consensus in literature about which evaluation metrics to use. However, we believe that focusing on both accuracy and macro- F 1 offers good insights into the relative strengths and weaknesses of each approach.

Experimental results

We report the highest score of the approaches described above on each dataset in Table 2 . Complete evaluation results are available in supporting document S1 Table (including accuracy breakdown by label).


The top two approaches on each dataset are reported.


In the Stormfront and TRAC datasets, our proposed approach provides state-of-the-art or competitive results for hate speech detection. On Stormfront, the mSVM model achieves 80% accuracy in detecting hate speech, which is a 7% improvement from the best published prior work (which achieved 73% accuracy). BERT performs 2% better than our approach, but the interpretability of the decisions the BERT model made are difficult to explain.

On the TRAC dataset, our mSVM approach achieves 53.68% macro F 1 score. Note that through optimization on the validation set, we found that using TF-IDF weights for character N-grams works better on Facebook dataset, so we report results using those TF-IDF instead of raw counts. This outperforms all other approaches we experimented with, including the strong BERT system. We also compared our approach to the other systems that participated in the shared task [ 19 ], and observed that we outperform them as well in terms of the metric they reported (weighted F-score) by 1.34% or higher. This is particularly impressive because our approach outperformed systems which rely on external datasets and data augmentation strategies.

Our approach outperformed the top-ranked ensemble [ 41 ] method by 3.96% in terms of accuracy and 2.41% in terms of F 1 . This indicates that mSVM learns from different aspects and preserves more signals as compared to a simple ensemble method that uses all features for each first-level classifier. BERT achieved 3% lower in terms of accuracy and 1% lower in terms of F 1 than our proposed method and still provided minimal interpretability, demonstrating that forgoing interpretability does not necessarily provide higher accuracy. For HatEval and HatebaseTwitter, the neural ensemble approach outperforms our method suggesting that neural approaches are better suited for Twitter data than mSVM-based solution. Previous works reported various metrics, e.g. a support-weighted F 1 in Davidson, et. al. [ 9 ], making comparison between models difficult. We report macro F 1 to mitigate the effect of imbalance between classes, which is an effect that has been baked in during the construction of the datasets. For a fair and complete comparison between the systems, we execute the systems from the previous works and calculate macro F 1 on the datasets reported in this study. The previous best performance on the Stormfront dataset used a recurrent neural network to achieve an accuracy of 0.73 [ 14 ]; our approach easily outperforms this method. On the TRAC dataset, others reported a weighted F 1 performance of 0.6425 using a recurrent neural network, without reporting accuracy or macro-averaged F 1 [ 19 , 42 ]. On HatebaseTweitter, others reported a macro F 1 score of 0.94 [ 33 ], but this is achieved by combining the hate and offensive categories, greatly simplifying the task.

In S1 Table , we observe that for most datasets and approaches, the accuracy is biased towards the majority class in the training data. This suggests the need for datasets that are more representative of real data distributions for future evaluation.

Considering the above mixed in terms of dominance evaluation results, given potential ethical concerns related to hate speech detection, we err on the side of caution and opt for interpretability over uncertain improvements on the evaluation metrics.

Interpretation of mSVM.

We analyzed the top features of the mSVM classifier on the Stormfront dataset. The meta-classifier weights character 4-grams and word unigrams as the highest contributors to the overall score. 4-grams such as “jew”, “ape”, “mud”, “egro” are among the strongest signals of being hate. (Note that whitespace contributes to character 4-grams.) This category appears to capture the aspect of a group’s identity. Word unigrams such as “invasion” and “violence” contribute highly to hate classification, and appear to capture the attack aspect. The top word unigrams, 2-grams and 3-grams from the view-classifier results of each dataset are in S3 Table . We found that the accuracy of all view-classifiers is at least two percent lower than the meta-classifier. The full comparison between view-classifier and meta-classifier results are given in supplementary information S2 Table . We also found that, although three other view-classifiers outperform the word unigram model, the meta-classifier still weights its score higher than those models, further suggesting that it captures a different hate speech aspect.

Interpretation of BERT.

Because the BERT model employs a self-attention mechanism, one can visualize the terms that the model relies most upon for classification purposes. We present attention visualizations from BertViz [ 43 ] for the trained BERT model on the mis-classified forum post “I don’t think anyone is insinuating that we are equal to non whites, or that we would ignore white nations.” (this post does not satisfy the authors’ conditions for hate speech, but the BERT model classified it as hateful). We present an detailed attention weights for all 12 attention heads of the classification token on layer 11 in Fig 1 . Despite appearing to be the most informative layer, we observe that Layer 11 does not provide a clear answer to why the model labeled the post as hateful; the attention is distributed among most words in the sentence, and many of the weights with the most attention do not appear to be informative (e.g., we ). When investigating the other layers (overview given in S1 Fig in the supplementary information) and other posts, we similarly do not see strong trends that would enable interpretability. This demonstrates the limitation of using deep neural models—even those with claims of interpretability—when trying to interpret the decisions made. These observations are in line with prior work that has found attention signals to be noisy and not necessarily indicative of term importance [ 39 , 40 ]. While our approach can be combined with neural models, it would come at the expense of increased model complexity and reduced interpretability.


Each color represents a different attention head, and the lightness of the color represents the amount of attention. For instance, the figure indicates that nearly all attention heads focus heavily on the term ‘we’.


Error analysis.

To gain a better understanding of our mSVM classifier’s mistakes, we qualitatively analyze its false positive (FP) and false negative (FN) samples on the Stormfront dataset. We categorized the misclassified posts based on their mutual linguistic features, semantic features, and length. 41% of the posts misclassified as not hate needed surrounding context to understand that the post is hate speech. 7% of the FN were implicit hate, making it difficult to classify, such as “ Indeed, I haven’t seen or heard machines raping or robbing people in the streets of Stockholm yet, non-european immigrants however… ”. Furthermore, given that the inter-annotator agreement is not perfect in the dataset [ 14 ] (prior work shows that high inter-annotator agreement for hate speech is difficult to achieve [ 11 , 44 ]), we analyzed some borderline cases with the definition of hate speech used for annotation. When manually re-assessing the misclassified posts, we found that the gold label of the 17% of the FN and 10% of the FP posts do not match our interpretation of the post content. Another major problem is with posts that are aggressive but do not meet the necessary conditions to be considered hate speech. These constitute 16% of the FP. Finally, short posts (6 or fewer terms, representing less than 3% of hate speech sentences found in the dataset) increased FP as well, occurring 7% of the time. The remaining misclassified posts were miscellaneous cases including posts that are sarcastic or metaphoric.

Shortcomings and future work

A challenge faced by automatic hate speech detection systems is the changing of attitudes towards topics over time and historical context. Consider the following excerpt of a Facebook post:

“…The merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions…”

Intuition suggests that this is hate speech; it refers to Native Americans as “merciless Indian savages”, and dehumanizes them by suggesting that they are inferior. Indeed, the text satisfies conditions used in most definitions of hate speech. However, this text is actually a quote from the Declaration of Independence. Given the historical context of the text, the user who posted it may not have intended the hate speech result, but instead meant to quote the historical document for other purposes. This shows that user intent and context play an important role in hate speech identification.

As another example, consider the phrase “the Nazi organization was great.” This would be considered hate speech because it shows support for a hate group. However, “the Nazi ’s organization was great” isn’t supporting their ideals but instead commenting on how well the group was organized. In some contexts, this might not be considered hate speech, e.g., if the author was comparing organizational effectiveness over time. The difference in these two phrases is subtle, but could be enough to make the difference between hate speech or not.

Another remaining challenge is that automatic hate speech detection is a closed-loop system; individuals are aware that it is happening, and actively try to evade detection. For instance, online platforms removed hateful posts from the suspect in the recent New Zealand terrorist attack (albeit manually), and implemented rules to automatically remove the content when re-posted by others [ 2 ]. Users who desired to spread the hateful messages quickly found ways to circumvent these measures by, for instance, posting the content as images containing the text, rather than the text itself. Although optical character recognition can be employed to solve the particular problem, this further demonstrates the difficulty of hate speech detection going forward. It will be a constant battle between those trying to spread hateful content and those trying to block it.

As hate speech continues to be a societal problem, the need for automatic hate speech detection systems becomes more apparent. We presented the current approaches for this task as well as a new system that achieves reasonable accuracy. We also proposed a new approach that can outperform existing systems at this task, with the added benefit of improved interpretability. Given all the challenges that remain, there is a need for more research on this problem, including both technical and practical matters.

Supporting information

S1 table. full comparison of hate speech classifiers..


S2 Table. Full comparison of view classifiers in mSVM.


S3 Table. Top 10 weighted vocabularies learned by Word-level view classifier.

This list has been sanitized.


S1 Fig. Visualization of self-attention weights for the forum BERT model.

All layers and attention heads for the sentence “ I don’t think anyone is insinuating that we are equal to non whites, or that we would ignore white nations. ” are included. Darker lines indicate stronger attention between terms. The first token is the special classification token.



We thank Shabnam Behzad and Sajad Sotudeh Gharebagh for reviewing early versions of this paper and for helpful feedback on this work. We also thank the anonymous reviewers for their insightful comments.

  • 1. Robertson C, Mele C, Tavernise S. 11 Killed in Synagogue Massacre; Suspect Charged With 29 Counts. 2018;.
  • 2. The New York Times. New Zealand Shooting Live Updates: 49 Are Dead After 2 Mosques Are Hit. 2019;.
  • 3. Hate Speech—ABA Legal Fact Check—American Bar Association;. Available from: https://abalegalfactcheck.com/articles/hate-speech.html .
  • 4. Community Standards;. Available from: https://www.facebook.com/communitystandards/objectionable_content .
  • 5. Hate speech policy—YouTube Help;. Available from: https://support.google.com/youtube/answer/2801939 .
  • 6. Hateful conduct policy;. Available from: https://help.twitter.com/en/rules-and-policies/hateful-conduct-policy .
  • 7. Mondal M, Silva LA, Benevenuto F. A Measurement Study of Hate Speech in Social Media. In: ACM HyperText; 2017.
  • View Article
  • Google Scholar
  • 9. Davidson T, Warmsley D, Macy MW, Weber I. Automated Hate Speech Detection and the Problem of Offensive Language. ICWSM. 2017;.
  • 10. Zimmerman S, Kruschwitz U, Fox C. Improving Hate Speech Detection with Deep Learning Ensembles. In: LREC; 2018.
  • 11. Ross B, Rist M, Carbonell G, Cabrera B, Kurowsky N, Wojatzki M. Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis. In: The 3rd Workshop on Natural Language Processing for Computer-Mediated Communication @ Conference on Natural Language Processing; 2016.
  • 14. de Gibert O, Perez N, Garc’ia-Pablos A, Cuadros M. Hate Speech Dataset from a White Supremacy Forum. In: 2nd Workshop on Abusive Language Online @ EMNLP; 2018.
  • 15. Popat K, Mukherjee S, Yates A, Weikum G. DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. In: EMNLP; 2018.
  • 16. Hatebase;. Available from: https://hatebase.org/ .
  • 17. Waseem Z, Hovy D. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In: SRW@HLT-NAACL; 2016.
  • 18. Waseem Z. Are you a racist or am i seeing things? annotator influence on hate speech detection on twitter. In: Proceedings of the first workshop on NLP and computational social science; 2016. p. 138–142.
  • 19. Kumar R, Ojha AK, Malmasi S, Zampieri M. Benchmarking Aggression Identification in Social Media. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). ACL; 2018. p. 1–11.
  • 20. CodaLab—Competition;. Available from: https://competitions.codalab.org/competitions/19935 .
  • 21. Detecting Insults in Social Commentary;. Available from: https://kaggle.com/c/detecting-insults-in-social-commentary .
  • 24. Grossman DA, Frieder O. Information Retrieval: Algorithms and Heuristics. Berlin, Heidelberg: Springer-Verlag; 2004.
  • 25. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, editors. Advances in Neural Information Processing Systems 26. Curran Associates, Inc.; 2013. p. 3111–3119.
  • 26. Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:181004805 [cs]. 2018;.
  • 27. Yang Z, Chen W, Wang F, Xu B. Unsupervised Neural Machine Translation with Weight Sharing. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics; 2018. p. 46–55. Available from: http://aclweb.org/anthology/P18-1005 .
  • 28. Kuncoro A, Dyer C, Hale J, Yogatama D, Clark S, Blunsom P. LSTMs Can Learn Syntax-Sensitive Dependencies Well, But Modeling Structure Makes Them Better. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics; 2018. p. 1426–1436. Available from: http://aclweb.org/anthology/P18-1132 .
  • 30. Kim Y. Convolutional Neural Networks for Sentence Classification. In: EMNLP; 2014.
  • 31. Hagen M, Potthast M, Büchner M, Stein B. Webis: An Ensemble for Twitter Sentiment Detection. In: SemEval@NAACL-HLT; 2015.
  • 32. Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. ACL; 2017. p. 427–431.
  • 33. Zhang Z, Robinson D, Tepper J. Detecting hate speech on twitter using a convolution-gru based deep neural network. In: European Semantic Web Conference. Springer; 2018. p. 745–760.
  • 34. Zhao J, Xie X, Xu X, Sun S. Multi-view learning overview: Recent progress and new challenges. Information Fusion. 2017;.
  • 36. Chand N, Mishra P, Krishna CR, Pilli ES, Govil MC. A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection. In: 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA)(Spring). IEEE; 2016. p. 1–6.
  • 37. Dong YS, Han KS. Boosting SVM classifiers by ensemble. In: Special interest tracks and posters of the 14th international conference on World Wide Web. ACM; 2005. p. 1072–1073.
  • 38. Abdullah A, Veltkamp RC, Wiering MA. Spatial pyramids and two-layer stacking SVM classifiers for image categorization: A comparative study. In: 2009 International Joint Conference on Neural Networks. IEEE; 2009. p. 5–12.
  • 39. Jain S, Wallace BC. Attention is not Explanation. ArXiv. 2019;abs/1902.10186.
  • 40. Serrano S, Smith NA. Is Attention Interpretable? In: ACL; 2019.
  • 41. Arroyo-Fernández I, Forest D, Torres JM, Carrasco-Ruiz M, Legeleux T, Joannette K. Cyberbullying Detection Task: The EBSI-LIA-UNAM system (ELU) at COLING’18 TRAC-1. In: The First Workshop on Trolling, Aggression and Cyberbullying @ COLING; 2018.
  • 42. Aroyehun ST, Gelbukh A. Aggression Detection in Social Media: Using Deep Neural Networks, Data Augmentation, and Pseudo Labeling. In: Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Santa Fe, New Mexico, USA: Association for Computational Linguistics; 2018. p. 90–97. Available from: https://www.aclweb.org/anthology/W18-4411 .
  • 43. Vig J. Visualizing Attention in Transformer-Based Language Representation Models. arXiv preprint arXiv:190402679. 2019;.
  • 44. Waseem Z. Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter. In: NLP+CSS @ EMNLP; 2016.


Hate speech inciting violence now potentially illegal under EU law, regulator says

Eu directive aimed at illegal activity and disinformation online comes into effect across all member states.

essay on online hate speech

Under a new directive, online platforms of a certain size must implement various measures, including mechanisms to counter illegal content, protect minors, and provide information to users affected by content moderation decisions. Photograph: Scott Olson/Getty Images

The era of self-regulation of online platforms is over, although social media companies will remain the “first line of defence” against illegal content, according to Coimisiún na Meán’s Digital Services Commissioner John Evans.

Mr Evans was speaking as a European Union directive aimed at preventing illegal activity and the spread of disinformation online came into effect over the weekend.

“Platforms large and small will have to be accountable for illegal content on their platforms, and also for content on their platforms that breaches their community rules,” Mr Evans said.

There is a “long list of content that could potentially be illegal”, including hate speech that incites violence, he said.

Boy bands music quiz: Which Irish group once earned £2.5m for a private gig for the Sultan of Brunei?

Boy bands music quiz: Which Irish group once earned £2.5m for a private gig for the Sultan of Brunei?

Gordon D’Arcy: Welsh rugby faces a golden opportunity to secure its future

Gordon D’Arcy: Welsh rugby faces a golden opportunity to secure its future

Renting out a room: ‘I feel safer at night having someone else in the house’

Renting out a room: ‘I feel safer at night having someone else in the house’

Michael McDowell: Has anyone thought about what happens when a durable relationship ends?

Michael McDowell: Has anyone thought about what happens when a durable relationship ends?

The Digital Services Act, an EU directive that aims to “make the online environment safer, fairer and more transparent”, was introduced in November 2022 and since Saturday applies fully across all EU member states.

Under the directive, online platforms of a certain size must implement various measures, including mechanisms to counter illegal content, protect minors, and provide information to users affected by content moderation decisions.

In the Republic, Coimisiún na Meán will operate under the directive, with the power to investigate platforms, impose fines, and issue compliance notices and orders to end infringements.

The parameters of what qualifies as illegal content is broad, Mr Evans told RTÉ radio 1′s This Week on Sunday.

“If there is content that conveys a credible threat of violence; if it’s part of a campaign of harassment; if it’s offensive in a kind of sexual nature; if it encourages people to commit suicide or makes information available for them to do that – there’s a long list of content that could potentially be illegal, and if that content is posted, then the platform, once it becomes aware of it, has to take it down,” Mr Evans said.

Asked if hate speech would qualify as illegal under the EU legislation, Mr Evans said: “If it can incite violence, yes, of course, yes, potentially.”

[  EU law on harmful speech could make Ireland’s hate offences Bill obsolete  ]

Misinformation or disinformation is not content that is clearly illegal, a “grey area”, Mr Evans said, and Coimisiún na Meán will not have the power to order the immediate removal of such content. “Some platforms have different views on what constitutes unacceptable material,” he said.

Mr Evans noted that the Digital Services Act contains specific provisions regarding misinformation and disinformation on larger online platforms that are the responsibility of the European Commission .

Online platforms will remain the “first line of defence” against illegal content, Mr Evans said.

“The platforms themselves are supposed to have mechanisms available for people to flag content, illegal content or content that they think breaches their terms and conditions, to the attention of the platforms and there’s a process then the people can go through to get the content taken down. And what we’re here to do, is to make sure that all the platforms do that in a consistent, legal way.”

If reported content is not removed by a platform, users can appeal to Coimisiún na Meán: “In those kinds of situations, we will have to look and make a determination – is that content clearly illegal or not?”

The commission will have “sole responsibility” for dealing with complaints made against certain platforms headquartered here – although this does not include the likes of Facebook or X .

“There are other platforms which fall below a threshold of 45 million users in the US , and there’s quite a number of those based in Ireland as well, and we have responsibility, sole responsibility, for those ones,” Mr Evans said.

He said that if content online qualified as a “safety of life” issue, An Garda Síochána should be the first point of contact.

  • See our new project Common Ground , Evolving Islands: Ireland & Britain
  • Sign up for push alerts and have the best news, analysis and comment delivered directly to your phone
  • Find The Irish Times on WhatsApp and stay up to date
  • Our In The News podcast is now published daily – Find the latest episode here

Fiachra Gallagher

Fiachra Gallagher

Fiachra Gallagher is an Irish Times journalist


Recent rté controversy is of bakhurst’s own making, transparency should trump non-disclosure agreements on rté exit deals - td, rté's ‘process of elimination’ maths shouldn’t prove too hard, ‘year of elections’ could lift news publishers – even if print ads are not the next gin, why i stopped paying my tv licence fee: ‘the whole model is broken’, mexico’s sinaloa cartel created an irish network to aid large movement of drugs, gardaí believe, limerick crash: tributes paid to two ‘exemplary’ agricultural students killed when car hit wall, ‘i think i would be happier as a single man, but despair at the thoughts of a messy separation’, summer jobs in childhood can come back to bite at pension time, temperatures set to drop to zero tonight with snow and sleet forecast for some areas on friday, latest stories, chanelle pharma profits fell ahead of takeover, state’s public debt continues to fall since pandemic peak, us veto of un resolution for ‘immediate humanitarian ceasefire’ in gaza angers sponsor algeria, hse told to make changes to post-mastectomy supports following criticism, gemma o’doherty ordered to appear before high court over alleged refusal to obey order.

IT Sunday

  • Terms & Conditions
  • Privacy Policy
  • Cookie Information
  • Cookie Settings
  • Community Standards

Amnesty International Logotype

EU/Global: European Commission’s TikTok probe aims to help protect young users

Responding to the European Commission’s decision to investigate TikTok over concerns that  the online social media platform may be failing to comply with the bloc’s Digital Services Act (DSA) by not doing enough to protect young users , Damini Satija, Programme Director at Amnesty Tech, said:

“We welcome the European Commission’s decision to investigate TikTok over the possibility that it breached the DSA by failing to protect children and young people. The mental health consequences being inflicted on children and young people by the social media giant remain a longstanding concern.

“In 2023, Amnesty International’s research showed that TikTok can draw children’s accounts into dangerous rabbit holes of content that romanticizes self-harm and suicide within an hour of signing up on the platform. Children and young people also felt their TikTok use affected their schoolwork and social time with friends and led them to scroll through their feeds late at night instead of catching enough sleep.

“By design, TikTok aims to maximize engagement, which systemically undermines children’s rights. It is essential that TikTok takes urgent action to address these systemic risks. Children and young users should be offered the right to access safe platforms, and the protection of these rights cannot wait any longer.” Damini Satija, Programme Director at Amnesty Tech

In 2023, Amnesty International released two reports highlighting abuses suffered by children and young people using TikTok.

One report, Driven into the Darkness: How TikTok Encourages Self-harm and Suicidal Ideation , shows how TikTok’s pursuit of young users’ attention risks exacerbating their mental health concerns, such as depression, anxiety and self-harm.

Another report,  “I feel Exposed”: Caught in TikTok’s Surveillance Web , reveals TikTok’s rights-abusing data collection practices are sustained by harmful user engagement practices.

Both reports form part of a body of our work exposing the business models of big tech firms that prioritize profits over human rights.

WPXI Pittsburgh

Report reveals DOJ lacks complete data collection for online hate crimes

A hate crime occurs nearly every hour on average in the U.S., according to data reported to the FBI.

Oftentimes, hate speech online can fuel violence in real life.

A new watchdog report is revealing the Justice Department (DOJ) needs more complete information about hate crimes that occur on the internet.

The report from the U.S. Government Accountability Office (GAO) says there are two main ways the FBI collects information about hate crimes: one way is through the FBI’s Uniform Crime Reporting Program, which gathers voluntarily submitted information from law enforcement agencies; and the second way is through the National Crime Victimization Survey.

The survey is meant to produce estimates on hate crimes based on what households are saying, but it does not include information about cyber-related hate crimes. It’s leaving a gap in how these hate incidents are tracked.

“What we found was if there’s an area with high prevalence of online hate speech, there’s a likelihood that there’s increased physical hate crimes in that same area,” said Triana McNeil, a Director for GAO’s Homeland Security and Justice Team. “Victims of hate crimes are not the only victims. The victims are also those that look like them. These can be people with a disability. This can be based on race or gender… It’s got ripple effects.”

According to the report, research shows up to a third of internet users say they have experienced hate speech online.

“Not only are we seeing an increased prevalence of general hate speech and hate, bias-motivated actions online, but we’re also seeing an increase in severe online harassment,” said Lauren Krapf, Director of Policy and Impact for the Center of Tech and Society for the Anti-Defamation League (ADL). “There is an association between online hate and offline violence.”

GAO and ADL said it’s critical to make sure there is complete information available about hate speech online.

“What happens online, too often doesn’t stay there,” said Krapf. “We want to be able to stop something before it gets worse… This is happening in the real world and in the real digital world and it’s important we pay attention to both.”

The report calls on the Bureau of Justice Statistics (BJS) to explore options to measure bias-motivated crimes on the internet through the national yearly survey.

In response, the DOJ said it agrees with the recommendations.

“BJS will continue to conduct research on the intersection of crimes that are bias-motivated and occur on the internet,” wrote the Assistant Attorney General in the DOJ response. “Based on that research, BJS will determine how best to measure bias-motivated criminal victimization that occurs through the internet.”

Download the FREE WPXI News app for breaking news alerts.

Follow Channel 11 News on Facebook and Twitter . | Watch WPXI NOW


  • Why is Facebook chirping on phones? How to turn off the sound
  • Man charged in overnight Moon Township shooting
  • ‘We’re all hurting’: Friend of man killed during argument with his girlfriend’s father speaks out
  • VIDEO: Karns City QB Mason Martin’s father remains hopeful his son will fully recover from brain injury
  • DOWNLOAD the Channel 11 News app for breaking news alerts

Report reveals DOJ lacks complete data collection for online hate crimes

Asked about online harms bill, Poilievre raises Trudeau's past use of blackface

Trudeau should 'look into his own heart and ask himself why he was such a hateful racist,' poilievre says.

essay on online hate speech

Social Sharing

Conservative Leader Pierre Poilievre said Wednesday his party is vehemently opposed to the government's forthcoming online harms legislation, a bill designed to combat hate speech, terrorist content and some violent material on the internet.

Saying he won't accept "Justin Trudeau's woke authoritarian agenda," Poilievre said the prime minister and his government shouldn't be deciding what constitutes "hate speech" online and called the legislation an "attack on freedom of expression."

"Justin Trudeau said anyone who criticized him during the pandemic was engaging in hate speech," Poilievre said, citing Trudeau's COVID-era comment that trucker convoy protesters were " a small fringe minority " who were "holding unacceptable views."

The Liberal government has touted the legislation as a way to rein in online abuse and force social media companies to do a better job of policing platforms where degrading content is a regular feature of the user experience.

  • What we know about Justin Trudeau's blackface photos — and what happens next
  • No more Pornhub? That will depend on what happens with a Senate bill

Poilievre said that as far as his caucus is concerned, the bill is dead on arrival.

"What does Justin Trudeau mean when he says the words 'hate speech'? He means the speech he hates," Poilievre said. "You can assume he will ban all of that."

Poilievre also framed his opposition in deeply personal terms, saying Trudeau is not the leader to legislate on this issue.

He said no one should take lessons on hate from a prime minister who once wore blackface and racist costumes.

"I point out the irony that someone who spent the first half of his adult life as a practicing racist, who dressed up in hideous racist costumes so many times he says he can't remember them all, should then be the arbiter of what constitutes hate. What he should actually do is look into his own heart and ask himself why he was such a hateful racist," Poilievre said.

While he opposes the upcoming online harms legislation, Poilievre suggested he's open to another kind of crackdown on content.

Asked Wednesday if he supports a law that would require age verification before accessing pornography online, Poilievre said he does.

There's a Senate bill , S-210, working its way through Parliament aimed at doing just that.

Trudeau's blackface incidents

Pictures of Trudeau dressed in blackface first emerged in the 2019 election campaign.

The decades-old images showed Trudeau dressed up as Aladdin, with his face darkened by makeup, at an  Arabian Nights -themed gala held when he was a teacher.

Justin Trudeau is seen wearing blackface in this April 2001image published in a newsletter from the West Point Grey Academy.

Trudeau later said he also wore blackface in high school to sing Harry Belafonte's hit  Day O  at a talent show.

Another image surfaced of Trudeau wearing blackface at an unidentified event in the 1990s.

Trudeau has apologized repeatedly for the incidents, saying he should have known better.

"I take responsibility for my decision to do that. I shouldn't have done it," he said after Aladdin image surfaced at the height of the 2019 campaign.

  • Experts urge federal government to pursue moderate, 'judicious' approach to harmful content online

"I should have known better. It was something that I didn't think was racist at the time, but now I recognize it was something racist to do and I am deeply sorry.

"It was not something that represents the person I've become, the leader I try to be, and it was really embarrassing."

Freedom of speech

The long-promised online harms bill that prompted Poilievre's comments could be tabled as soon as next week, government sources tell CBC News.

A similar bill was introduced in 2019 but died on the order paper when the 2021 election was called. It hasn't been revived until now.

The last bill was roundly criticized by privacy experts and civil liberties groups who said its provisions requiring that online platforms remove content flagged as harmful within 24 hours would encourage companies to take an overly cautious approach, resulting in suppression of free speech.

Other groups have championed the legislation as a way to protect kids and keep people safe online in an era of rampant abuse.

Justice Minister Arif Virani, who is expected to table the bill, has vowed to strike the right balance between offering protections to Canadians and upholding freedom of expression.

In a recent speech to the Canadian Bar Association, Virani said he's confident the government can legislate measures to promote an online world where "users can express themselves without feeling threatened or fuelling hate."

The Liberal Party's 2021 election platform promises to "combat serious forms of harmful online content, specifically hate speech, terrorist content, content that incites violence, child sexual abuse material and the non-consensual distribution of intimate images."

The platform promised to do this through changes to the Canada Human Rights Act and the Criminal Code. 


essay on online hate speech

Senior reporter

J.P. Tasker is a journalist in CBC's parliamentary bureau who reports for digital, radio and television. He is also a regular panellist on CBC News Network's Power & Politics. He covers the Conservative Party, Canada-U.S. relations, Crown-Indigenous affairs, climate change, health policy and the Senate. You can send story ideas and tips to J.P. at [email protected].

  • Follow J.P. on X

With files from the Canadian Press

Related Stories

Add some “good” to your morning and evening.

Your weekly guide to what you need to know about federal politics and the minority Liberal government. Get the latest news and sharp analysis delivered to your inbox every Sunday morning.


  1. Online Hate Speech (Chapter 4)

    Nathaniel Persily and Joshua A. Tucker Chapter Save PDF Cite Summary This chapter examines the state of the literature-including scientific research, legal scholarship, and policy reports-on online hate speech.

  2. Prevalence of Online Hate Speech on Social Media

    National governments now recognize online hate speech as a pernicious social problem. In the wake of political votes and terror attacks, hate incidents online and offline are known to peak in tandem. This article examines whether an association exists between both forms of hate, independent of 'trigger' events.

  3. Hate Speech on Social Media: Global Comparisons

    Zachary Laub Updated June 7, 2019 3:51 pm (EST) Summary Hate speech online has been linked to a global increase in violence toward minorities, including mass shootings, lynchings, and ethnic...

  4. Perspective

    "The more you hide behind 'trolling,' the more you can launder white supremacy into the mainstream," said Phillips, who released a report this year, " The Oxygen of Amplification ," that analyzed...

  5. Hate speech or free speech: an ethical dilemma?

    Hate speech is one of the most resilient manifestations of cyberviolence, and is not to be equalled with free speech.

  6. Report: Online hate increasing against minorities, says expert

    23 March 2021 A new report by the Special Rapporteur on minority issues, Dr Fernand de Varennes, looks at how to address rising online hate speech against minority groups.

  7. Hate speech regulation on social media: A contemporary challenge

    Regulating hate speech online is a major policy challenge. Policymakers must ensure that any regulation of social media platforms does not unduly impair freedom of speech. Given the complexity of the problem, close monitoring of new legislative initiatives around the world is necessary to assess whether a good balance has been struck between ...

  8. Racism, Hate Speech, and Social Media: A Systematic Review and Critique

    In parallel, scholars have grown increasingly concerned with racism and hate speech online, not least due to the rise of far-right leaders in countries like the US, Brazil, India, and the UK and the weaponization of digital platforms by white supremacists. This has caused a notable increase in scholarship on the topic.

  9. Prevalence and Psychological Effects of Hateful Speech in Online

    We find that hateful speech is prevalent in college subreddits, and 25% of them show greater hateful speech than non-college subreddits. We also find that the exposure to hate leads to greater stress expression. However, everybody exposed is not equally affected; some show lower psychological endurance than others.

  10. Internet, social media and online hate speech. Systematic review

    This systematic review aimed to explore the research papers related to how Internet and social media may, or may not, constitute an opportunity to online hate speech. 67 studies out of 2389 papers found in the searches, were eligible for analysis. We included articles that addressed online hate speech or cyberhate between 2015 and 2019. Meta-analysis could not be conducted due to the broad ...

  11. Viral sticks, virtual stones: addressing anonymous hate speech online

    Woods and Ruscher provide an overview of research on the types and harms of hate speech, and a consideration of the impact both of anonymity and of the Internet on these harms, as well as describe an illustrative example of a type of hate speech online that is often anonymous—derogatory Internet memes—in an effort to encourage further ...

  12. Addressing hate speech on social media: contemporary challenges

    The identification of online hate speech for research purposes is confronted with numerous challenges, from a methodological perspective - including definitions used to frame the issue, social and historical contexts, linguistic subtleties, the variety of online communities and forms of online hate speech (type of language, images, etc.).

  13. Thirty years of research into hate speech: topics of ...

    The volume of academic papers published in a representative sample, from 1992 to 2019, displays a significant increase after 2010; thus, in the main evolution of online hate speech research, it has been possible to identify an initial development stage (1992-2010) followed by a rapid development (2011-2019).

  14. PDF Online Hate Speech: Hate or Crime?

    It defines online hate speech as "any written material, any image or any other representation of ideas or theories, which advocates, promotes or incites hatred, discrimination or violence, against any individual or group of individuals, based on race, colour, descent or national or ethnic origin, as well as religion if used as a pretext for any ...


    The 2013 murder of Drummer Lee Rigby in Woolwich, London, UK led to an extensive public reaction on social media, providing the opportunity to study the spread of online hate speech (cyber hate ...

  16. The regulation of hate speech online and its enforcement

    Sophie Turenne. On the initiative of the British Association of Comparative Law, this issue develops a broad comparative perspective on aspects of the legal regulation of hate speech online in China, France, Germany, the UK, Europe and the US. This editorial introduces the key lines of debates running through the papers.

  17. Defining Hate Speech

    Projects & Tools 01 Harmful Speech Online This essay seeks to review some of the various attempts to define hate speech, and pull from them a series of traits that can be used to frame hate speech with a higher degree of confidence.

  18. Internet, social media and online hate speech. Systematic review

    Online hate speech refers to the use of offensive language, focused on a specific group of people who share a common property • The identification of the potential targets of hateful or antagonistic speech is key to distinguishing the online hate from arguments that represent political viewpoints •

  19. What is hate speech?

    Hate speech calls out real or perceived "identity factors" of an individual or a group, including: "religion, ethnicity, nationality, race, colour, descent, gender," but also characteristics...

  20. Hate speech, toxicity detection in online social media: a ...

    This paper presents a survey of online hate speech identification using different Artificial Intelligence techniques. This review study looks into a number of research questions shown in Table 1 that will help us to learn about the most recent trends in online hate speech in the field of artificial intelligence. It also includes an overview of recently used machine learning and deep learning ...

  21. Online hate speech and hate crime

    1 déc. 2023 Countering online xenophobia and racism: new study underlines increased relevance of the First Protocol to the Convention on Cybercrime

  22. Hate speech detection: Challenges and solutions

    As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent ...

  23. When Online Hate Speech Has Real World Consequences

    When Online Hate Speech Has Real World Consequences This mini-lesson explores celebrity influence and online hate, specifically antisemitism. Published: November 3, 2022 Save Share to Google Classroom Print this Page At a Glance Mini-Lesson Language English — US Subject Social Studies Grade 6-12 Antisemitism Overview Preparing to Teach Activities

  24. Hate speech inciting violence now potentially illegal under EU law

    Asked if hate speech would qualify as illegal under the EU legislation, Mr Evans said: "If it can incite violence, yes, of course, yes, potentially.". [ EU law on harmful speech could make ...

  25. EU/Global: European Commission's TikTok probe aims to help protect

    Responding to the European Commission's decision to investigate TikTok over concerns that the online social media platform may be failing to comply with the bloc's Digital Services Act (DSA) by not doing enough to protect young users , Damini Satija, Programme Director at Amnesty Tech, said: "We welcome the European Commission's decision to investigate TikTok […]

  26. Free Speech or Hate Speech?

    Under the Civil Rights Act of 1964, Nunziato noted, GW has a responsibility to provide an educational environment free of discrimination. The panel's discussion touched on the recent congressional hearings at which the presidents of three elite universities were criticized for saying that whether speech could be considered hate speech depends ...

  27. Report reveals DOJ lacks complete data collection for online hate ...

    A hate crime occurs nearly every hour on average in the U.S., according to data reported to the FBI. Oftentimes, hate speech online can fuel violence in real life. A new watchdog report is ...

  28. Asked about online harms bill, Poilievre raises Trudeau's past use of

    The Liberal Party's 2021 election platform promises to "combat serious forms of harmful online content, specifically hate speech, terrorist content, content that incites violence, child sexual ...