Read More
Date: 25-3-2021
2821
Date: 9-3-2021
1440
Date: 17-4-2021
2906
|
In many cases, we would like to know some properties of a whole population. For example, manufacturers need to know which new products are in demand; a state government would like to know which of two proposed new highways would be used more; newspapers would like to predict elections. In these cases, the cost of asking all members of a population would be prohibitive. So we check with a sample of the population, and try to predict from these results.
In other cases, examining the whole population just does not work. If a pharmaceutical company wants to know whether a medication is effective in curing a disease, they cannot test it on all cases of the disease; they are interested in treating those who will contract the disease in the future. And in some cases they wish to compare two treatments that cannot both be given to the same patient. Again, the solution is to treat a sample.
In this chapter we consider an important question: if we have results from a sample, how do we use these to predict the results for the population, and how reliable are our conclusions?
Predictions
Say you want to know the mean of a distribution. Your best approach is to take a sample and find its mean; you would expect the mean of your sample to be close to the mean of the distribution. The bigger the sample, the more accurate you expect the sample mean to be. In fact, the mean of a sample is itself a variable. For example, if you are sampling from a population of size N, and you take a sample of size n, there are (Nn) possible samples, so the population of means is a set of (Nn) numbers,
and you are essentially choosing one of them at random. As we pointed out in the previous chapter, sample means approximately follow a normal distribution.
If the original distribution has mean µ and standard deviation σ, we can assume that the distribution of sample means is approximately normal, with mean µ and standard deviation σ/√n.
Sample Problem 1.1 Suppose you roll two dice and add the scores, 100 times.
What is the expected value of the average roll? What is its standard deviation?
Solution. The mean sum is 7, and the standard deviation is 2.42. So you expect the mean of the sample to be 7, and the standard deviation of this mean is 0.242.
In this example, we used the population distribution to find out some properties of sample means. The reverse process is often used: when you want to know more about some variable (family incomes in an area, for example, or the heights of 9- year-olds), you take a sample. The main worry people have about sample data is:
how reliable is a sample? The standard deviation is helpful here.
Suppose you measure some property; the sample mean is m and you calculate the standard deviation of your sample as s. (We shall talk about reliable ways to estimate a population standard deviation in the next section.) Then these are your best guesses for the population mean µ and standard deviation σ. So we assume that sample means for this property are distributed approximately normally, with mean m and standard deviation s/√n. Turning this around, our best guess for the population mean is m, and there is a 95% probability that the population mean lies between m−2(s/√n) and m+2(s/√n). This is called the 95% confidence interval for the population mean, and 2(s/√n) is the margin of error.
Sample Problem 1.2 A sample of 100 workers in a factory have a mean income of $42,000. The standard deviation of the sample is $4,000. What is the 95% confidence interval for the average salary of workers?
Solution. Here m = $42,000,s = $4,000 and n = 100, so 2(s/√n) = 2 ×$4,000/10 = $800 and the 95% confidence interval is $42,000 ± $800 or $41,200 to $42,800.
Sometimes we know the mean of a distribution. For example, there are reliable estimates available of family incomes in districts, population weights and heights at various ages, and so on. In that case, we can tell whether a sample mean is closeto the actual mean. If the population mean lies outside the 95% confidence interval calculated from the sample mean, this usually indicates bias in the sample. A smaller difference is normally attributed to variability in the population.
The measurements that are taken of a sample are called statistics.
|
|
مخاطر عدم علاج ارتفاع ضغط الدم
|
|
|
|
|
اختراق جديد في علاج سرطان البروستات العدواني
|
|
|
|
|
مدرسة دار العلم.. صرح علميّ متميز في كربلاء لنشر علوم أهل البيت (عليهم السلام)
|
|
|