The Economic Journal (2024), with S. Keita and J. Valette
This paper analyses whether the systematic disclosure of criminals’ origins in the press affects natives’ attitudes towards immigration. It takes advantage of the unilateral change in reporting policy announced by the German newspaper Sächsische Zeitung in July, 2016. Combining individual-level panel data from the German Socio-Economic Panel from 2014 to 2018 with 402,819 crime-related articles in German newspapers and those newspapers’ market shares, we find that systematically mentioning the origins of criminals increases the relative salience of natives’ criminality and reduces natives’ concerns about immigration, breaking the implicit link between immigration and crime.
Journal of International Money and Finance (2022), with M. Picault and J. Pinter
We construct a new indicator to capture media sentiment about the European Central Bank monetary policy and its relevant environment by analyzing 25,000 articles from five major international newspapers. Using named entity recognition and part-of-speech tagging, we propose a methodology to dissociate the dissemination of official communications of the central bank from the media comments. The resulting (daily) index correlates with some (monthly) standard measures of economic sentiment but reveals idiosyncratic information on monetary policy. Analyzing the determinants of our index, we find that both press conference and inter-meeting communications of the President significantly affect media sentiment. We then show that, controlling for a large range of factors, daily changes in media sentiment have predictive power for financial market inflation expectations.
We construct a novel database containing hundreds of thousands geotagged messages related to the COVID-19 pandemic sent on Twitter. We create a daily index of social distancing—at the state level—to capture social distancing beliefs by analyzing the number of tweets containing keywords such as “stay home”, “stay safe”, “wear mask”, “wash hands” and “social distancing”. We find that an increase in the Twitter index of social distancing on day t-1 is associated with a decrease in mobility on day t. We also find that state orders, an increase in the number of COVID-19 cases, precipitation and temperature contribute to reducing human mobility. Republican states are also less likely to enforce social distancing. Beliefs shared on social networks could both reveal the behavior of individuals and influence the behavior of others. Our findings suggest that policy makers can use geotagged Twitter data—in conjunction with mobility data—to better understand individual voluntary social distancing actions.
We use a dataset of approximately one million messages sent on StockTwits to explore the relationship between investor sentiment on social media and intraday Bitcoin returns. We find a statistically significant relationship between investor sentiment and Bitcoin returns for frequencies of up to 15 minutes. For lower frequencies, the relation disappears. We also find that the impact of sentiment on returns is concentrated on the period around the Bitcoin bubble. However, the magnitude of the effect is rather small making it impossible for a trader to make economic profits by trading on the information published on social media.
Journal of Public Economics (2020), with D. Altig, S. Baker, JM. Barrero, N. Bloom, P. Bunne, S. Chen, S. Davis, J. Leather, B. Meyer, E. Mihaylov, P. Mizen, N. Parker, P. Smietanka and G. Thwaites
We consider several economic uncertainty indicators for the US and UK before and during the COVID-19 pandemic: implied stock market volatility, newspaper-based policy uncertainty, Twitter chatter about economic uncertainty, subjective uncertainty about business growth, forecaster disagreement about future GDP growth, and a model-based measure of macro uncertainty. Four results emerge. First, all indicators show huge uncertainty jumps in reaction to the pandemic and its economic fallout. Indeed, most indicators reach their highest values on record. Second, peak amplitudes differ greatly – from a 35% rise for the model-based measure of US economic uncertainty (relative to January 2020) to a 20-fold rise in forecaster disagreement about UK growth. Third, time paths also differ: Implied volatility rose rapidly from late February, peaked in mid-March, and fell back by late March as stock prices began to recover. In contrast, broader measures of uncertainty peaked later and then plateaued, as job losses mounted, highlighting differences between Wall Street and Main Street uncertainty measures. Fourth, in Cholesky-identified VAR models fit to monthly U.S. data, a COVID-size uncertainty shock foreshadows peak drops in industrial production of 12–19%.
We investigate the efficient market hypothesis at the intraday level by analyzing market reactions to negative tweets and reports published on the Internet by an activist short seller. Conducting event studies, we find that fast-moving traders can generate small, albeit significant, abnormal profit by trading on public information published on social media. The market reaction to tweets is stronger when a company is mentioned for the first time on Twitter, showing that investors can disentangle new information from noise in real time. We also find that traders who manage to identify the information on the short seller’s website before the dissemination of the same news on Twitter can generate much greater abnormal returns. As acquiring information on a website is more costly and difficult than acquiring the same information on Twitter, our findings provide empirical evidence supporting the Grossman–Stiglitz paradox at the intraday level. Very short-lived market anomalies do exist in the stock market to compensate investors who spent time and money in setting up bots and algorithms to trade on new information before the crowd.
We use a large dataset of one million messages sent on the microblogging platform StockTwits to evaluate the performance of a wide range of preprocessing methods and machine learning algorithms for sentiment analysis in finance. We find that adding bigrams and emojis significantly improve sentiment classification performance. However, more complex and time-consuming machine learning methods, such as random forests or neural networks, do not improve the accuracy of the classification. We also provide empirical evidence that the preprocessing method and the size of the dataset have a strong impact on the correlation between investor sentiment and stock returns. While investor sentiment and stock returns are highly correlated, we do not find that investor sentiment derived from messages sent on social media helps in predicting large capitalization stocks return at a daily frequency.
Economics and Statistics (2018), with C. Bortolli and S. Combes
GDP statistics in France are published on a quarterly basis, 30 days after the end of the quarter. In this article, we consider media content as an additional data source to traditional economic tools to improve short-term forecast/nowcast of French GDP. We use a database of more than a million articles published in the newspaper Le Monde between 1990 and 2017 to create a new synthetic indicator capturing media sentiment about the state of the economy. We compare an autoregressive model augmented by the media sentiment indicator with a simple autoregressive model. We also consider an autoregressive model augmented with the Insee Business Climate indicator. Adding a media indicator improves French GDP forecasts compared to these two reference models. We also test an automated approach using penalised regression, where we use the frequencies at which words or expressions appear in the articles as regressors, rather than aggregated information. Although this approach is easier to implement than the former, its results are less accurate.
Journal of International Money and Finance (2017), with M. Picault
We develop a field-specific dictionary to measure the stance of the European Central Bank monetary policy (dovish, neutral, hawkish) and the state of the Eurozone economy (positive, neutral, negative) through the content of ECB press conferences. In contrast with traditional textual analysis, we propose a novel approach using term-weighting and contiguous sequence of words (n-grams) to better capture the subtlety of central bank communication. We find that quantifying ECB communication using our field-specific weighted lexicon do help predicting future ECB monetary decision and European stock market volatility. Our indicators significantly outperform a textual classification based on the Loughran-McDonald or Apel-Blix-Grimaldi dictionaries and a media-based measure of economic policy uncertainty.
We implement a novel approach to derive investor sentiment from messages posted on social media before we explore the relation between online investor sentiment and intraday stock returns. Using an extensive dataset of messages posted on the microblogging platform StockTwits, we construct a lexicon of words used by online investors when they share opinions and ideas about the bullishness or the bearishness of the stock market. We demonstrate that a transparent and replicable approach significantly outperforms standard dictionary-based methods used in the literature while remaining competitive with more complex machine learning algorithms. Aggregating individual message sentiment at half-hour intervals, we provide empirical evidence that online investor sentiment helps forecast intraday stock index returns. After controlling for past market returns, we find that the first half-hour change in investor sentiment predicts the last half-hour S&P 500 index ETF return. Examining users’ self-reported investment approach, holding period and experience level, we find that the intraday sentiment effect is driven by the shift in the sentiment of novice traders. Overall, our results provide direct empirical evidence of sentiment-driven noise trading at the intraday level.
This dissertation makes methodological and empirical contributions to three issues related to the informational efficiency of financial markets through the use of Big Data analytics. More precisely, it analyzes: (1) how to measure intraday investor sentiment and determine the relation between investor sentiment and aggregate market returns, (2) how to measure investor attention to news in real time, and identify the relation between investor attention and the price dynamics of large capitalization stocks, and (3) how to detect suspicious behaviors that could undermine the informational role of financial markets and determine the relation between the level of posting activity on social media and small-capitalization stock returns. In that regard, the research design of each essay involves the construction of new datasets of messages published on social media sites to create novel indicators in order to: (1) measure investor sentiment, (2) proxy investor attention to news, and (3) detect suspicious stock recommendations that could be related to market manipulation. Using textual analysis, network theories, event studies, or predictive regressions, this dissertation provides empirical evidence that textual content published on social media contains value-relevant information about asset price formation.