The Dataset

The M4 consists of 100,000 time series of Yearly, Quarterly, Monthly and Other (Weekly, Daily and Hourly) data.

The minimum number of observations is 13 for yearly, 16 for quarterly, 42 for monthly, 80 for weekly, 93 for daily and 700 for hourly series.

The 100,000 time series of the dataset come mainly from the Economic, Finance, Demographics and Industry areas, while also including data from Tourism, Trade, Labor and Wage, Real Estate, Transportation, Natural Resources and the Environment.

The M4 Competition series, as those of the M-1 and M-3, aim at representing the real world as much as possible. The series were selected randomly from a database of 900,000 ones on December 28, 2017. Professor Makridakis chose the seed number for generating the random sample that determined the M4 Competition data. Some pre-defined filters were applied beforehand to achieve some desired characteristics, such as the length of the series, the percentage of Yearly, Quarterly, Monthly, Weekly, Daily, and Hourly data, as well as their type (Micro, Macro, Finance, Industry, Demographic, Other).

You can download the dataset here: M4Dataset (.rar) | M4Dataset (.zip)

If you are using R, the dataset is also available here: M4comp2018
(we would like to thank Rob J Hyndman’s PhD students Pablo Montero-Manso, Carla Netto, and Thiyanga Talagala for putting the M4 data on the R package at Github)

Additional information regarding the type, the frequency and the number of forecasts required per series can be found here: Info

Below is the number of time series based on their frequency and type:

Frequency Demographic Finance Industry Macro Micro Other Total
Yearly 1,088 6,519 3,716 3,903 6,538 1,236 23,000
Quarterly 1,858 5,305 4,637 5,315 6,020 865 24,000
Monthly 5,728 10,987 10,017 10,016 10,975 277 48,000
Weekly 24 164 6 41 112 12 359
Daily 10 1,559 422 127 1,476 633 4,227
Hourly 0 0 0 0 0 414 414
Total 8,708 24,534 18,798 19,402 25,121 3,437 100,000

 

September 28, 2017