The Volatility You Can’t See

Volatility is one of the most important numbers in finance, yet it has a strange feature: it cannot be directly observed. Volatility is a latent variable, meaning it is a real property of markets, but it can only be inferred from the footprints it leaves on prices.

As a useful analogy, consider intelligence. We all agree intelligence is “real” in the sense that it influences outcomes, but there is no single instrument that reveals a person’s true level of intelligence as one precise number. The best we can do is rely on measurable proxies: test scores, grades, problem-solving speed, memory tasks, and so on. Each proxy captures something useful, but none provides a definitive measure of a person’s underlying cognitive ability.

Volatility works the same way. We do not observe it directly as a single, objective number. What we do observe are prices and returns, the footprints volatility leaves over time. From those footprints we can build proxies meant to summarize how volatile the market was over a given period.

So when someone says, “Today’s volatility was 18%”, what they really mean is: “Using this proxy, over this window, and at this sampling frequency (and usually annualized), our estimate of volatility is 18%”. That estimate might be computed from close-to-close returns, intraday realized volatility, range-based estimators, EWMA, GARCH-family models, or many others. These are not competing truths; they are different measurement choices.

Once you accept that “18%” depends on a measurement setup, the next question is unavoidable: what does it mean to forecast volatility? In practice, we are forecasting tomorrow’s value of a chosen benchmark proxy, not the latent volatility state itself.

This is where volatility forecasting becomes harder to evaluate than most people assume. If there is no single true value, then a forecast cannot be graded against a perfect answer key. We end up judging forecasts by comparing them to a benchmark proxy, and that benchmark depends on choices such as:

the sampling frequency (daily, 5-minute, 30-minute returns),
how we treat overnight gaps,
how we handle microstructure noise in high-frequency data,
and whether we treat jumps as part of “volatility” or as a separate phenomenon.

In simple terms, “predicting volatility” usually means predicting the value of a chosen benchmark proxy. Change the proxy, and you change the target, so measured accuracy, and even model rankings, can shift.

For quants, the key point is that you cannot fully separate the forecasting model from the benchmark used as the target. You are not simply comparing “Model A vs Model B”. You are comparing “Model A vs Model B under a specific definition of volatility”, meaning the exact proxy you choose to forecast and score against.

If you change the benchmark, for example from realized volatility computed from 5-minute returns to a close-to-close measure based on daily returns, you have effectively changed the target variable. You are now grading models against a different yardstick.

That is why a model can look very strong under one benchmark and much weaker under another, without any error or bad faith. The outcome reflects how well the model matches that particular benchmark, not an absolute statement about forecasting “true volatility”.

Beyond the benchmark choice, there is another design decision that can materially change your conclusions: the loss function used for evaluation. Some loss functions penalize large misses heavily, others reward stability, and others focus on performance in stressed markets. In practice, this often comes down to a classic trade-off: some models are less biased on average, while others have lower dispersion (more stable errors), and the scoring rule determines which property you end up calling “better”.

It also helps to separate two different ways a forecast can fail.

The first is bias, a systematic tendency to overshoot or undershoot the benchmark. A concrete example is using option-implied volatility (such as the VIX) as a forecast for subsequent realized volatility over a matching horizon. Option-implied measures often run higher than later realized volatility because investors are willing to pay for insurance, which creates a volatility risk premium. That upward bias is not necessarily a fatal flaw. In many applications you can account for it, for example by calibrating the forecast (adjusting it downward based on history).

The second issue is dispersion, which can be caused by noise in the benchmark itself. If your benchmark proxy is noisy, your measured forecast errors will look noisy too, even if the model has real skill. In short samples, you may end up ranking models based on who got luckier with a noisy benchmark rather than who forecasts better. Using a more informative benchmark, for example a carefully constructed intraday realized volatility measure instead of a simple close-to-close estimate, often reduces benchmark noise. That can mechanically reduce the apparent dispersion of forecast errors and make model comparisons more stable, especially when you only have a few days or weeks of evaluation.

The practical takeaway is that the “best volatility forecast” is rarely universal. A vol-targeting portfolio manager (often thinking in months), an options desk (focused on hedging relative to implied dynamics), and an intraday trader (focused on the next few hours) will naturally prefer different benchmarks, and will judge the bias-versus-dispersion trade-off differently.

The deeper takeaway is that volatility forecasting is not just a modeling problem, but a design problem. Before asking which model performs best, you must decide what volatility means in your context, which proxy you are willing to treat as the benchmark, over what horizon, and under which loss function. Without making those choices explicit, debates about forecasting accuracy are often debates about definitions rather than skill. Once they are explicit, volatility forecasts become easier to interpret, easier to compare, and far more useful for real decision-making.

If you have any questions, do not hesitate to reach out at carlo@concretumgroup.com. You can also visit www.concretumgroup.com for more articles on quantitative trading.

Carlo Zarattini

Founder of Concretum Group | Co-Founder of RCandles.com | Quantitative Trading Research published on SSRN.com