## An Introduction to Statistical Inference and Its Applications with R

The mathematical description of variation is central to statistics. The probability required for statistical inference is not primarily axiomatic or combinatorial, but is oriented toward describing data distributions. Sampling Unit: A unit is a person, animal, plant or thing which is actually studied by a researcher; the basic objects upon which the study or experiment is executed. For example, a person; a sample of soil; a pot of seedlings; a zip code area; a doctor's practice.

### Our customers love us

Parameter: A parameter is an unknown value, and therefore it has to be estimated. Parameters are used to represent a certain population characteristic. For example, the population mean m is a parameter that is often used to indicate the average value of a quantity. Within a population, a parameter is a fixed value that does not vary.

## Quantum theory and the bayesian inference problems

Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean m in the population from which that sample was drawn. Statistic: A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population.

For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn. A statistic is a function of an observable random sample. It is therefore an observable random variable. Notice that, while a statistic is a"function" of observations, unfortunately, it is commonly called a random"variable" not a function. It is possible to draw more than one sample from the same population, and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic.

## An introduction to statistical inference and its applications with R in SearchWorks catalog

The average values in more than one sample, drawn from the same population, will not necessarily be equal. Statistics are often assigned Roman letters e. The word estimate means to esteem, that is giving a value to something. A statistical estimate is an indication of the value of an unknown quantity based on observed data.

More formally, an estimate is the particular value of an estimator that is obtained from a particular sample of data and used to indicate the value of a parameter. Example: Suppose the manager of a shop wanted to know m , the mean expenditure of customers in her shop in the last year. She could calculate the average expenditure of the hundreds or perhaps thousands of customers who bought goods in her shop; that is, the population mean m.

Instead she could use an estimate of this population mean m by calculating the mean of a representative sample of customers. There are two broad subdivisions of statistics: Descriptive Statistics and Inferential Statistics as described below. Descriptive Statistics: The numerical statistical data should be presented clearly, concisely, and in such a way that the decision maker can quickly obtain the essential characteristics of the data in order to incorporate them into decision process.

The principal descriptive quantity derived from sample data is the mean , which is the arithmetic average of the sample data. It serves as the most reliable single measure of the value of a typical member of the sample. If the sample contains a few values that are so large or so small that they have an exaggerated effect on the value of the mean, the sample is more accurately represented by the median -- the value where half the sample values fall below and half above. The quantities most commonly used to measure the dispersion of the values about their mean are the variance s 2 and its square root, the standard deviation s.

The variance is calculated by determining the mean, subtracting it from each of the sample values yielding the deviation of the samples , and then averaging the squares of these deviations.

The mean and standard deviation of the sample are used as estimates of the corresponding characteristics of the entire group from which the sample was drawn. They do not , in general, completely describe the distribution F x of values within either the sample or the parent group; indeed, different distributions may have the same mean and standard deviation. They do, however, provide a complete description of the normal distribution, in which positive and negative deviations from the mean are equally common, and small deviations are much more common than large ones.

### 1st Edition

For a normally distributed set of values, a graph showing the dependence of the frequency of the deviations upon their magnitudes is a bell-shaped curve. About 68 percent of the values will differ from the mean by less than the standard deviation, and almost percent will differ by less than three times the standard deviation. Inferential Statistics: Inferential statistics is concerned with making inferences from samples about the populations from which they have been drawn.

In other words, if we find a difference between two samples, we would like to know, is this a"real" difference i. That's what tests of statistical significance are all about. Any inferred conclusion from a sample data to the population from which the sample is drawn must be expressed in a probabilistic term. Probability is the language and a measuring tool for uncertainty in our statistical conclusions. Inferential statistics could be used for explaining a phenomenon or checking for validity of a claim. In these instances, inferential statistics is called Exploratory Data Analysis or Confirmatory Data Analysis , respectively.

Statistical Inference: Statistical inference refers to extending your knowledge obtained from a random sample from the entire population to the whole population. This is known in mathematics as Inductive Reasoning , that is, knowledge of the whole from a particular.

Its main application is in hypotheses testing about a given population. Statistical inference guides the selection of appropriate statistical models. Models and data interact in statistical work. Inference from data can be thought of as the process of selecting a reasonable model, including a statement in probability language of how confident one can be about the selection. Normal Distribution Condition: The normal or Gaussian distribution is a continuous symmetric distribution that follows the familiar bell-shaped curve.

One of its nice features is that, the mean and variance uniquely and independently determines the distribution. It has been noted empirically that many measurement variables have distributions that are at least approximately normal. Even when a distribution is non-normal, the distribution of the mean of many independent observations from the same distribution becomes arbitrarily close to a normal distribution, as the number of observations grows large.

Many frequently used statistical tests make the condition that the data come from a normal distribution. Estimation and Hypothesis Testing: Inference in statistics are of two types. The first is estimation , which involves the determination, with a possible error due to sampling, of the unknown value of a population characteristic, such as the proportion having a specific attribute or the average value m of some numerical measurement. To express the accuracy of the estimates of population characteristics, one must also compute the standard errors of the estimates. The second type of inference is hypothesis testing. It involves the definitions of a hypothesis as one set of possible population values and an alternative, a different set. There are many statistical procedures for determining, on the basis of a sample, whether the true population characteristic belongs to the set of values in the hypothesis or the alternative.

Statistical inference is grounded in probability, idealized concepts of the group under study, called the population, and the sample. The statistician may view the population as a set of balls from which the sample is selected at random, that is, in such a way that each ball has the same chance as every other one for inclusion in the sample.

Notice that to be able to estimate the population parameters , the sample size n must be greater than one. Greek Letters Commonly Used as Statistical Notations We use Greek letters as scientific notations in statistics and other scientific fields to honor the ancient Greek philosophers who invented science and scientific thinking. Before Socrates, in 6 th Century BC, Thales and Pythagoras, amomg others, applied geometrical concepts to arithmetic, and Socrates is the inventor of dialectic reasoning. The revival of scientific thinking initiated by Newton's work was valued and hence reappeared almost years later.

Greek Letters Commonly Used as Statistical Notations alpha beta ki-sqre delta mu nu pi rho sigma tau theta a b c 2 d m n p r s t q Note: ki-square ki-sqre, Chi-square , c 2 , is not the square of anything, its name implies Chi-square read, ki-square. Ki does not exist in statistics. I'm glad that you're overcoming all the confusions that exist in learning statistics. Type of Data and Levels of Measurement Information can be collected in statistics using qualitative or quantitative data. Qualitative data , such as eye color of a group of individuals, is not computable by arithmetic relations.

They are labels that advise in which category or class an individual, object, or process fall. They are called categorical variables.

Quantitative data sets consist of measures that take numerical values for which descriptions such as means and standard deviations are meaningful. They can be put into an order and further divided into two groups: discrete data or continuous data.