# Abstract

ARMA Process is a widely used model of time series, which has a lot of good properties. We will show in this article that, by testing the residual of a sequence, we can test whether or not ARMA model can be applied.

The most important part is to build a test statistic of white with Ljung-Box method.

# Background

Consider a real valued time series $(Z_{k})_{k\in \mathbb{Z}}$, we want to build a statistical test of the hypothesis

$$ H_{0}={(Z_{k})_{k\in \mathbb{Z}} \mbox{is a white noise}}$$

against

$$ H_{1}={(Z_{k})_{k\in \mathbb{Z}} \mbox{is not a white noise}}$$

Let $\hat{\mu}_n$ be the empirical mean, $\hat{\gamma}_n$ be the empirical autocovariance function, $\hat{\rho}_n$ be the empirical autocorrelation function:

$$\hat{\gamma}=n^{-1}\sigma_{1\leq s,s+t\leq n}(Z_s-\hat{\mu}_n)(Z_{s+t}-\hat{\mu}_n)$$

$$\hat{\rho}_n=\frac{\hat{\gamma}_n}{\gamma}$$

The Ljung-Box statistical test at lag $h>1$ is then defined as

$$ T_n(h)=n(n+2)\sum_{t=1}^{h}\frac{\hat{(\rho}_n(t))^2}{n-t} $$

We now do this step by step.

## Step 1

Under $H_{0}$ and assuming moreover that $E[Z_{0}^{4}]<\infty$, show that the asymptotic distribution of $T_n(h)$ as $n\rightarrow$ is $\chi^2_h$.

Proof:

Under these assumptions, we can get:

$$ \sqrt{n}(\hat{\rho}_{n}-\rho)\Rightarrow\mathcal{N}(0,W) $$

since $\rho(j)=0$ for all $j\neq 0$

we have:

$$ \sqrt{n}\hat{\rho}_{n}(t)\Rightarrow\mathcal{N}(0,1) $$

furthermore, since

$$ \frac{n+2}{n-t}\xrightarrow{P}1 $$

By applying Slutsky’s Lemma, we get

$$ n(n+2)\frac{\hat{(\rho}_n(t))^2}{n-t}\Rightarrow\chi^2,\mbox{for all } t>0 $$

Thus, the conclusion is achieved:

$$ n(n+2)\sum_{t=1}^{h}\frac{\hat{(\rho}_n(t))^2}{n-t}\Rightarrow\chi^2_{h} $$

## Step 2

If $(Z_{k})_{k\in \mathbb{Z}}$ is an MA(1) process,

show that for all $m>0$, $\lim_{n\rightarrow\infty}\mathbb{P}(T_n(h)>m)=1$

We first have

$$ \hat{\gamma}_{n}(t)=\gamma(t)+O_P(n^{-1/2}) $$

Now we want to prove for any $m>0$

$$\lim_{n\to\infty} P(\sqrt{n}\hat{\gamma}_{n}(t)+O_p(1)>m(\hat{\gamma}_{n}(0)+O_p(n^{-1/2}))) = 1, t>0$$

Since $m O_p(n^{-1/2})$is $O_p(n^{-1/2})$, and obviously it is $O_p(1)$, we just need to prove that

$$\lim_{n\to\infty} P(\sqrt{n}\hat{\gamma}_{n}(t)+O_p(1)>m\hat{\gamma}_{n}(0)) = 1, t>0$$

for big enough $n$, we have

$$\lim_{n\to\infty} P(O_p(1)>|\sqrt{n}\hat{\gamma}_{n}(t)-m\hat{\gamma}_{n}(0)|)=0, t>0$$

then for arbitrary $\epsilon>0$, we have a $N$, for any $n>N$, we have

$$P(O_p(1)>|\sqrt{n}\hat{\gamma}_{n}(t)-m\hat{\gamma}_{n}(0)|)<\epsilon$$

With this conclusion, we can get

$$\lim_{n\to\infty} P(T_n(h)>(n-2)\sum_{t=1}^h\frac{m^2}{n-t}) = 1$$

which is equivalent to:

$$\lim_{n\to\infty} P(T_n(h)>m) = 1$$

## Step 3

Propose a test of $H_0$ relying on the statistic $T_n(h)$ and on the quantile function of the $\chi_n^2$

Since under assumption $H_0$, $T_n(h)$ accords with $\chi^2_h$, we can test $H_0$ by test the value of $T_n(h)$:

suppose $F(x)$ is the cumulative function of $\chi^2_h$, given a significance value $\alpha$(0.01,0.05,etc.), if

$$p:=1-F(T_n(h))<\alpha$$

we can refuse $H_0$, otherwise, we can keep it.

## Step 4

Since $(X_{k})_{k\in \mathbb{Z}}$ is an AR(1) process, by Yule-Walker equations, we can solve that the best liner predictor of $X_t$ is:

$$X_t=\gamma(1)X_{t-1},$$

where $\gamma(t)$ is the autocovariance function of $(X_{k})_{k\in \mathbb{Z}}$.

$\hat{\phi}=\frac{\sum_{k=2}^n X_k X_{k-1}}{\sum_{k=2}^n X_{k-1} X_{k-1}}$ provides an estimation of $\gamma(1)$.

Thus the residual $Z_k=X_k-\phi X_{k-1}$ is a white noise. we can run Ljung-Box test on it to test if $X$ is an AR(1) process or not.

## Step 5

We simulate an AR(1) process and run the Ljung-Box test on it and its residuals with respect to different $h$, the p-values we get are showed below:

we do the same procedure on a MA(1) process, the p-values are shown below:

## Step 6

Distributions of p-values obtained from the original AR(1)process:

Distributions of p-values obtained from the residual of the AR(1)process:

We can see that p-value of the residual of AR(1) process obeys

Distributions of p-values obtained from the original MA(1)process:

We can see obviously that the p-value is mostly likely to be 0, as predicted by Question2.

Distributions of p-values obtained from the residual of the MA(1)process:

## Step 7

$$ \hat{\phi}=\arg\min_{\phi\in\mathbb{R} }\sum_{k=2}^n(X_k-\sum_{j=1}^p u_j X_{k-1}),$$

where $\phi=[u_1,u_2,\cdots,u_p]$.

To calculate $\hat{\phi}$, we differentiate above equation

with respect to $u_j$, and we get Yule-Walker equations.

$$\Gamma_p\phi=\gamma_p$$

where

\begin{aligned}

\Gamma_p&=Cov([X_{t-1}\cdots X_{t-p}]^T)^T\\

&=\begin{bmatrix}

\gamma(0)&\gamma(-1)&\cdots&\gamma(-p+1)\\

\gamma(1)&\gamma(0)&\gamma(-1)&\vdots\\

\vdots&\ddots&\ddots&\\

\gamma(p-1)&\gamma(p-2)&\cdots&\gamma(0)\\

\end{bmatrix}\\

\gamma_p&=[\gamma(1)\gamma(2)\cdots\gamma(p)]^T

\end{aligned}

We can estimate $\gamma(p)$ by$\frac{\sum_{k=p+1}^n X_k X_{k-p}}{\sum_{k=p+1}^n X_{k-p} X_{k-p}}$.

## Step 8

The data of GNP we used are as:

We take $X_{t}$ to be $log(G_t)-log(G_{t-1})$

## Step 9

p | 1 | 2 | 3 | 4 |

p-value | 0.0369 | 0.0341 | 0.0765 | 0.0573 |

from the table above, we can see that when $p$ takes value of 3, we get the biggest p-value. When p-value is bigger, we are more confident to say the residual accords with white noise, which implies the original process corresponds better to AR model. So we take $p$ to be 3 in Step 10 to predict future data.

## Step 10

The prediction we got when ($p=3$) compared with observations are shown below:

One can see that, at the beginning, our prediction has the similar trend. But when $T=70$(2008), the observation drops dramatically, which can not be predicted from the past.