The M4-Competition is the continuation of three previous ones organized by Spyros Makridakis (known as the Makridakis or M-Competitions) whose purpose is to identify the most accurate forecasting method(s) for different types of predictions.

These competitions have attracted a great interest in both the academic literature and among practitioners and have provided objective evidence of the most appropriate way of forecasting various variables of interest.

First competition in 1982

The first Makridakis Competition, held in 1982, and known in the forecasting literature as the M-Competition, used 1001 time series and 15 forecasting methods (with another nine variations of those methods included). According to a later paper by the authors, the following were the main conclusions of the M-Competition (Makridakis et al., 1982):

  1. Statistically sophisticated or complex methods do not necessarily provide more accurate forecasts than simpler ones.
  2. The relative ranking of the performance of the various methods varies according to the accuracy measure being used.
  3. The accuracy when various methods are combined outperforms, on average, the individual methods being combined and does very well in comparison to other methods.
  4. The accuracy of the various methods depends on the length of the forecasting horizon involved.

The findings of the study have been verified and replicated through other competitions and new methods by other researchers.

Newbold (1983) was critical of the M-competition, and argued against the general idea of using a single competition to attempt to settle the complex issue.

Before the first competition, the Makridakis–Hibon Study

Before the first M-Competition, Makridakis and Hibon (Makridakis and Hibon, 1979) published in the Journal of the Royal Statistical Society (JRSS) an article showing that simple methods perform well in comparison to the more complex and statistically sophisticated ones. Statisticians at that time criticized the results claiming that they were not possible. Their criticism motivated the subsequent M, M2 and M3 Competitions that prove beyond the slightest doubt those of the Makridakis and Hibon Study.

Second competition, published in 1993

The second competition, called the M-2 Competition or M2-Competition (Makridakis et al., 1993), was conducted on a grander scale. A call to participate was published in the International Journal of Forecasting, announcements were made in the International Symposium of Forecasting, and a written invitation was sent to all known experts on the various time series methods. The M2-Competition was organized in collaboration with four companies and included six macroeconomic series, and was conducted on a real-time basis. Data was from the United States. The results of the competition were published in a 1993 paper. The results were claimed to be statistically identical to those of the M-Competition.

The M2-Competition used much fewer time series than the original M-competition. Whereas the original M-competition had used 1001 time series, the M2-Competition used only 29, including 23 from the four collaborating companies and 6 macroeconomic series. Data from the companies was obfuscated through the use of a constant multiplier in order to preserve proprietary privacy. The purpose of the M2-Competition was to simulate real-world forecasting better in the following respects:

  • Allow forecasters to combine their statistically based forecasting method with personal judgment.
  • Allow forecasters to ask additional questions requesting data from the companies involved in order to make better forecasts.
  • Allow forecasters to learn from one forecasting exercise and revise their forecasts for the next forecasting exercise based on the feedback.

The competition was organized as follows:

  • The first batch of data was sent to participating forecasters in summer 1987.
  • Forecasters had the option of contacting the companies involved via an intermediary in order to gather additional information they considered relevant to making forecasts.
  • In October 1987, forecasters were sent updated data.
  • Forecasters were required to send in their forecasts by the end of November 1987.
  • A year later, forecasters were sent an analysis of their forecasts and asked to submit their next forecast in November 1988.
  • The final analysis and evaluation of the forecasts was done starting April 1991 when the actual, final values of the data including December 1990 were known to the collaborating companies.

In addition to the published results, many of the participants wrote short articles describing their experience participating in the competition and their reflections on what the competition demonstrated. Chris Chatfield praised the design of the competition but said that despite the organizers’ best efforts, he felt that forecasters still did not have enough access to the companies from the inside as he felt people would have in real-world forecasting. Fildes and Makridakis (1995) in an article have argued that despite the evidence produced by these competitions, the implications continued to be ignored, to a great extent, by theoretical statisticians.

Third competition, published in 2000

The third competition called the M-3 Competition or M3-Competition (Makridakis and Hibon, 2000), was intended to both replicate and extend the features of the M-competition and M2-Competition, through the inclusion of more methods and researchers (particularly researchers in the area of neural networks) and more time series. A total of 3003-time series was used. The paper documenting the results of the competition was published in the International Journal of Forecasting in 2000 and the raw data was also made available on the International Institute of Forecasters website. According to the authors, the conclusions from the M3-Competition were similar to those from the earlier competitions.

The time series included yearly, quarterly, monthly, daily, and other time series. In order to ensure that enough data was available to develop an accurate forecasting model, minimum thresholds were set for the number of observations: 14 for yearly series, 16 for quarterly series, 48 for monthly series, and 60 for other series.

Time series were in the following domains: micro, industry, macro, finance, demographics, and other. Below is the number of time series based on the time interval and the domain:

Time interval between successive observations Micro Industry Macro Finance Demographic Other Total
Yearly 146 102 83 58 245 11 645
Quarterly 204 83 336 76 57 0 756
Monthly 474 334 312 145 111 52 1428
Other 4 0 0 29 0 141 174
Total 828 519 731 308 413 204 3003

The five measures used to evaluate the accuracy of different forecasts were: symmetric mean absolute percentage error (also known as symmetric MAPE), average ranking, median symmetric absolute percentage error (also known as median symmetric APE), percentage better, and median RAE.

A number of other papers have been published with different analyses of the data set from the M3-Competition.


The purpose of the M4-Competition is to replicate the results of the previous three ones and extend them into two directions. First increasing the number of series to 100,000, and second including machine learning (Neural Network) forecasting method. Information about forecasting methods and their comparisons can be found in the textbook “Forecasting: Methods and Applications” (Makridakis, Wheelwright and Hyndman, 1998). A recent study discussing the accuracy of Statistical and Machine Learning methods, explaining why the performance of the latter is below that of the former and proposing some possible ways forward, can be found here .



  • Makridakis, S. and Hibon, M. “The M3-Competition: Results, Conclusions and Implications”, International Journal of Forecasting, Vol. 16, No. 4, 2000, pp 451-476.
    (Number of citations Google Scholar: 1,118)
  • Makridakis, S., et. al., “The M2-Competition: A Real-Time Judgmentally-Based Forecasting Study”, International Journal of Forecasting, Vol. 9, No. 1, 1993, pp. 5-23 (lead article).(Number of citations Google Scholar: 258)
  • Makridakis, S., et. al., “The Accuracy of Extrapolative (Time Series) Methods:  Results of a Forecasting Competition”, Journal of Forecasting, Vol. 1, No. 2, 1982, pp. 111-153 (lead article, was voted, in 2005, as the most favourite paper published, during the last 25 years, in the field of forecasting. It is one of the most cited paper in the field of forecasting).
    (Number of citations Google Scholar: 1,240)
  • Makridakis, S., Hibon, M., “Accuracy of Forecasting:  An Empirical Investigation”, (with discussion), Journal of the Royal Statistical Society, Series A, Vol. 142, Part 2, 1979, pp. 79-145 (lead article).
    (Number of citations Google Scholar: 562)
  • Makridakis, S., Wheelwright, S., and Hyndman, R., Forecasting:  Methods and Applications (Third Edition) Wiley, 1998, 642 pages (First Edition 1978, Second Edition 1983).(Number of citations Google Scholar: 4,580).
July 24, 2017