For decades, forecasting time series relied on statistical models like ARIMA or sequential neural networks such as LSTMs. Then came Transformers - architectures originally designed for language - and they are rapidly reshaping how we model sequential data.
In my research, I explored how transformer-based architectures can be applied to time series forecasting and how combining different transformer approaches can unlock new predictive capabilities.
From Language to Time Series
Transformers excel at understanding relationships within sequences. While originally designed for text, a time series is simply another type of sequence - a stream of numbers evolving over time.
This makes transformers uniquely capable of capturing both short-term patterns and long-range dependencies that traditional models struggle with.
Building a Hybrid Transformer
My work focused on combining two state-of-the-art architectures:
- PatchTST, which captures local patterns through patch-based segmentation.
- FEDFormer, which extracts multi-scale temporal patterns.
The result was a hybrid transformer model designed to leverage the strengths of both architectures: local pattern detection and global temporal understanding.
What the Results Show
Testing across several benchmark datasets revealed an interesting outcome:
- The hybrid model consistently outperformed the FEDFormer configuration.
- However, PatchTST remained the strongest standalone model overall.
This highlights an important lesson in machine learning research: progress often comes not from replacing models entirely, but from carefully combining their strengths.
Why This Matters
Time series forecasting drives decisions in finance, energy, traffic systems, and healthcare. Improvements in forecasting models translate directly into better planning, risk management, and resource allocation.
Transformer-based models are still evolving, but the evidence is clear: they are becoming one of the most powerful tools for forecasting complex real-world systems.
And we are only getting started!