Tweet volume spikes analysis and its application in trading
Disclaimer: this article is based on a study by Yuexin Mao, Wei Wei and Bing Wang, “Twitter Volume Spikes: Analysis and Application in Stock Trading”. We would like to thank them for this study. All copyrights belong to the original article’s respective owners, its authors.
Stock is a popular topic in Twitter. The number of tweets concerning a stock varies over days, and sometimes exhibits a signiﬁcant spike. In this paper, we investigate Twitter volume spikes related to S&P 500 stocks, and whether they are useful for stock trading. Authors develop a strategy that combines the Bayesian classiﬁer and a stock bottom picking method, and demonstrate that it can achieve signiﬁcant gain in a short amount of time. Simulation over a half year’s stock market data indicates that it achieves on average 8.6% gain in 27 trading days and 15.0% gain in 55 trading days. Statistical tests show that the gain is statistically signiﬁcant.
At Gambiste, we can provide our tweet volume data. The last month tweet volume data is currently free of charges for Alternative currencies here.
TWITTER VOLUME SPIKE ANALYSIS
Autors studied other related works, indeed several studies use Twitter to predict stock market. A recent study finds that specific public mood states in Twitter are significantly correlated with the Dow Jones Industrial Average (DJIA), and thus can be used to forecast the direction of DJIA changes. Another study finds that emotional tweet percentage is correlated with DJIA, NASDAQ and S&P 500. Later on, the study finds that Twitter sentiment indicator and the number of tweets that mention financial terms in the previous 1-2 days can be used to predict the daily market return.
The authors investigate whether the number of tweets for a stock spikes around the earnings dates. Suppose that a company’s earnings date is day t. An analysis takes place on whether the number of tweets on the company’s stock spikes around t, in particular, on days t−1, t and t + 1. In the data collection period, there are 509 earnings days for the stocks that were considered. They found 79.2% of them are surrounded by a Twitter volume spike, conﬁrming authors’ thoughts that people indeed tweet more about a stock around its earnings dates.
Time difference(in days) from an earnings day to the closest day that has a Twitter volume spike. A negative value corresponds to the time difference to the closest Twitter volume spike in the past.
Twitter volume spikes close to earnings days are likely due to the earnings days themselves. Since earnings days are public information that people know beforehand, these Twitter volume spikes are no surprises. These spikes cannot be used in building a trading strategy as the price reflects them beforehand. Thus, the authors needed a way to determine if a certain spike was expected or not. Option implied volatility can be used as an indicator to determine whether a Twitter volume spike is expected or not, whether it is related to a scheduled event.
Assume that for a stock, a Twitter volume spike happens on day t. In this figure, average daily implied volatility is plotted for both short-term options, i.e., those that will expire in 30 days after t, and longer-term options, i.e., those that will expire in 30 to 60 days after t. For short-term options, it can indeed be seen that the daily average implied volatility increases before t and decreases after t. For longer-term options, the trend is not clear. It was found out that 37.3% of the Twitter volume spikes are . Note that this percentage is a very conservative estimate and serves more like a lower bound, showing that a fair share of spikes are expected.
OTHER SPIKE FACTORS
The authors now investigate potential causes of Twitter volume spikes. Speciﬁcally, they consider the following ﬁve factors:
1. Stock breakout point,
2. Intraday price change rate,
3. Interday price change rate,
4. Earnings day, and
5. Stock option implied volatility.
Then, the authors calculate the correlation of each of these ﬁve factors with Twitter volume spikes.
The correlation analysis resulted in the following figure:
On the y axis we can see the CDF (cumulative distribution function) of the correlations between Twitter volume spikes and each of the ﬁve factors over all the stocks. Twitter volume spike has the strongest correlation with earnings days (with median of 0.37), which conﬁrms our earlier result that a signiﬁcant fraction of Twitter volume spikes occurs around earnings days. The correlation between Twitter volume spike and implied volatility has a median value of 0.14, much stronger than the correlation with the rest of the factors.
APPLICATION IN STOCK TRADING
Two trading strategies were developed, both using Twitter volume spikes as trading signals. For comparison, a baseline strategy that purchases a stock on a random day, and a strategy that uses trading volume spikes are considered.
First strategy was based solely on Bayesian classiﬁer.
Classifier’s training factor was the probability that buying the stock can lead to proﬁt after a number of days was calculated, and the stock was only bought when the probability was sufﬁciently large (above 0.7).
To evaluate the strategy, the data from February 21, 2012 to October 19, 2012 was used as training data, and the data from October 20, 2012 to March 31, 2013 was used as test data. This results in 573 Twitter volume spikes in the training set, and 672 Twitter volume spikes in the test set.
Implied volatility factor was excluded from testing and training because it requires using option data and hence does not provide a fair comparison with other strategies.
The results of the above simple strategy are encouraging, indicating that Twitter volume spikes are indeed useful in stock trading. On the other hand, the strategy does not consider the trend of a stock. For instance, it may buy a stock when the price of the stock is increasing, which may not lead to proﬁt. So, the authors propose an enchanced strategy that takes trends into consideration.
Enhanced strategy using bottom-picking method
The authors combine the Twitter volume spike strategy with a Zigzag based algorithm (based on ZigZag indicator), used to identify turning points for a given movement rate, λ, which is deﬁned as the minimum price difference ratio between two adjacent turning points.
The stock price turning point identiﬁcation algorithm for a given λ is described as follows:
(1) Start the search from the ﬁrst point in the dataset. Search forward until a potential turning point is found, i.e., one of the two conditions holds: (i) the price increases by at least λ from the start point, or (ii) the price decreases by at least λ from the start point. Continue the search.
(a) If condition (i) holds (i.e., the price moves upward), update the potential turning point when ﬁnding a point that is larger than the previous potential turning point. When ﬁnding a point that drops at least λ compared to the current potential turning point, set the current potential turning point to be a downward turning point.
(b) If condition (ii) holds (i.e., the price moves downward), update the potential turning point when ﬁnding a point that is smaller than the previous potential turning point. Set the current potential turning point to be an upward turning point.
(2) Start to search from the turning point. If the turning point is a upward turning point, go to Step (1a). If the turning point is a downward turning point, go to Step (1b). Repeat until the end of the data set.
For the stock, the top ﬁgure shows the price chart; the bottom ﬁgure shows the tweets ratio, i.e., the number of tweets on a day over the average number of tweets in the past 70 days, over time. A day with tweets ratio above K has a Twitter volume spike.
Thus, a factor of the price being near the upward turning point of the ZigZag is added to the strategy.
The authors conﬁrm that there is indeed strong evidence that the proﬁt is positive, and the enhanced strategy outperforms the random strategy as well as the strategy that uses stock trading volume spikes.
This figure plots the fraction of the winning trades using the enhanced strategy. We can observe that signiﬁcant fraction of the trades lead to proﬁt. For instance, when using intraday and interday price change rates, as much as 89.3% of the trades lead to proﬁt in 29 days.
Simulation over a half year’s stock market data demonstrates that both strategies lead to substantial proﬁts, and the enhanced strategy signiﬁcantly outperforms the basic strategy and a bottom picking method that uses trading volume spikes, which proves that using Twitter volume spikes in trading can indeed provide a statistical/trading edge and should be employed by the traders.
Retrieve last month tweet volume data on our website free of charge. We can also provide on demand the tweet volume data on alternative currencies and stocks since early 2015.