This problem based on Central Limit Theorem gives a detailed solution to ISI M.Stat 2018 PSB Problem 7, with a tinge of simulation and code.
Suppose \(\left(X_{1}, Y_{1}\right), \ldots,\left(X_{n}, Y_{n}\right)\) is a random sample from a bivariate normal distribution with \(\mathrm{E}\left(X_{i}\right)=\mathrm{E}\left(Y_{i}\right)=0, Var\left(X_{i}\right)=Var\left(Y_{i}\right)=1\)
and unknown \(Corr\left(X_{i}, Y_{i}\right)=\rho \in(-1,1),\) for all \(i=1, \ldots, n .\) Define \(W_{n}=\frac{1}{n} \sum_{i=1}^{n} X_{i} Y_{i}\) a) Is \(W_{n}\) an unbiased estimator of \(\rho ?\) Justify your answer.
(b) For large \(n,\) obtain an approximate level \((1-\alpha)\) two-sided confi-
dence interval for \(\rho,\) where \(0<\alpha<1\).
(a)
Just compute the \(E(W_{n}\)).
\(E(W_{n})\) = \(\frac{1}{n} \sum_{i=1}^{n} E(X_{i} Y_{i})\) = \(\frac{1}{n} \sum_{i=1}^{n} \rho = \rho \).
\( \rho = E(X_{i} Y_{i}) - E(X_{i})E(Y_{i}) \overset{E(X_{i}) = E(Y_{i}) = 0}{=} E(X_{i} Y_{i})\).
So, \(W_{n}\) is unbiased for \( \rho \).
(b)
Observe that \(\left(X_{i}, Y_{i}\right)\) and \(\left(X_{j}, Y_{j}\right)\) are independent sample and therefore iid.
So, \(\left(X_{i}Y_{i}\right)\) and \(\left(X_{j}Y_{j}\right)\) are also iid.
Hence, computing the limiting distribution of \(W_{n}\), flashes in our minds, the Central Limit Theorem. So, let's dig into it. But, for that we need the following:
So, how to calculate the \({E((X_{1}Y_{1})^2)}\). For that
Two random variables \(X\) and \(Y\) are said to be jointly normal if they can be expressed in the form \(X = aU + bV, Y = cU + dV \), where \(U\) and \(V\) are independent standard normal random variables.
Alternate Definition of Bivariate Normal
Why do we need this? Because, \(X\) and \(Y\) are not independent and they have a correlation coefficient between them.
Assume, \((X, Y)\) ~ \((X_1, Y_1)\).
Exercise: Using the above result, prove that \(Y\) can be written as \( Y = \rho X + \sqrt{(1-\rho^2)}V\), where \(V\) ~ N(0,1) and \(V\) is independent of \(X\).
\(Y^2 = \rho^2X^2 + (1-\rho^2)V^2 + 2\rho\sqrt{(1-\rho^2)}XV\)
\(E(X^2Y^2) = E(\rho^2 X^4 + (1-\rho^2)X^2V^2 + 2\rho\sqrt{(1-\rho^2)}X^3V ) = \\ \rho^2E(X^4) + (1-\rho^2)E(X^2V^2) = \rho^2E(X^4) + (1-\rho^2)E(X^2)E(V^2) = 3\rho^2 + (1-\rho^2) = 1 + 2\rho^2\).
Exercise: Justify the above steps, using the independence of \(X\) and \(V\).
We used the fact that \(E(X^4) = 3\) if \(X\) ~ N(0,1). Instead of computing the whole we will use the fact that \( E(Z) = n\) and \(Var(Z) = 2n\) if \(Z\) ~ \( {{\chi}_n}^2\).
Exercise: Prove that \(E(X^4) = 3\) if \(X\) ~ N(0,1) using the above hint that \( X^2\) ~ \({{\chi}_1}^2\).
The final result, we got is the following:
\(Var(W_{n}) = \frac{1 + 2\rho^2}{n}\).
\(E(W_{n}) = \rho\).
Now use Central Limit Theorem.
\( \frac{\sqrt{n}(W_{n} - \rho)}{\sqrt{1 + 2\rho^2}} \to N(0, 1)\)
Therefore, \( P( |\frac{\sqrt{n}(W_{n} - \rho)}{\sqrt{1 + 2\rho^2}}| \leq z_{\alpha / 2} ) = (1-\alpha)\).
So, \( P(\left[W_{n} - z_{\alpha / 2} \left(\frac{\sqrt{1 + 2\rho^2}}{\sqrt{n}}\right) \leq \rho \leq W_{n} +z_{\alpha / 2} \left( \frac{\sqrt{1 + 2\rho^2}}{\sqrt{n}} \right) \right]) = (1-\alpha)\). Now, you have to square it to get a confidence interval for \(\rho^2\).
But, we can use variance stablizing transformation (pivotal method).
Observe that \(f(x) = \int \frac{1}{\sqrt{1+2u^2}} = ln|x+\sqrt{\frac{1}{2} + x^2} |\), which is an increasing and hence bijective function.
\( {\sqrt{n}(f(W_{n}) - f(\rho))} \to N(0, c)\). Calculate this constanc \( c = f'(\rho)^2.{\sqrt{1 + 2\rho^2}} \)
Now, try to find a confidence interval for \(f(\rho)\) based on this. Then take the inverse of \(f(x)\) to get a confidence interval for \(\rho\).
N <- 2000 # Number of random samples
# Target parameters for univariate normal distributions
v = NULL
rho <- 0.5
mu1 <- 0; s1 <- 1
mu2 <- 0; s2 <- 1
mu <- c(mu1,mu2) # Mean
sigma <- matrix(c(s1^2, s1*s2*rho, s1*s2*rho, s2^2),
2) # Covariance matrix
library(MASS)
for (i in 1:1000) {
bvn1 <- mvrnorm(N, mu = mu, Sigma = sigma ) # from MASS package
W = bvn1[,1]*bvn1[,2]
Wbar = mean(W)
v = c(v, Wbar)
}
hist(v, freq = F)
sigma2 = sqrt(1 + 2*rho^2)/sqrt(N)
x = seq(0.4, 0.6, 0.00001)
curve(dnorm(x, rho, sigma2), from = 0, col = "red", add = TRUE)
This problem was a bit more mathematical and technical, but still, I hope that the simulation along with the proofs gave you a good reading experience. Stay Tuned!

In 2025, 8 students from Cheenta Academy cracked the prestigious Regional Math Olympiad. In this post, we will share some of their success stories and learning strategies. The Regional Mathematics Olympiad (RMO) and the Indian National Mathematics Olympiad (INMO) are two most important mathematics contests in India.These two contests are for the students who are […]

Cheenta Academy proudly celebrates the success of 27 current and former students who qualified for the Indian Olympiad Qualifier in Mathematics (IOQM) 2025, advancing to the next stage — RMO. This accomplishment highlights their perseverance and Cheenta’s ongoing mission to nurture mathematical excellence and research-oriented learning.

Cheenta students shine at the Purple Comet Math Meet 2025 organized by Titu Andreescu and Jonathan Kanewith top national and global ranks.

Celebrate the success of Cheenta students in the Stanford Math Tournament. The Unified Vectors team achieved Top 20 in the Team Round.
The confidence interval contains the unknown parameter i.e. correlation coefficient(row). How?
First of all, this is a large sample confidence interval. With respect to the large sample, the expectation of Wn when n large goes to rho. Hence, we are seeing the confidence interval around the mean. Just, expand that out. You will get the expression. See I have added a new portion. Thanks for your doubt. Stay tuned.
Why is the variance of X1Y1 = E(X1Y1)^2?
E(X1Y1) is rho and not zero
Variance of X1Y1 should be 1+rho^2.
var(Wn) will be (1-rho^2)/n
because var(Wn)=var(x1y1)/n
& var(X1Y1)=E(X1Y1)^2-(E(X1Y1))^2
=1+2*rho^2 - rho^2
=1+rho^2
hence var(Wn)=(1+rho^2)/n