What's Your Question?
The Best Keyword Research Tools
If you want your business to make it to the top, search engine optimization (SEO) and, in particular, identifying keywords to guide the type of content you publish is an essential component of your online marketing strategy. Using the best keyword research tools makes this task much easier.
Google Ads Keyword Planner
There are plenty of free SEO keyword tools available, and one of the most popular and useful is the Google Ads Keyword Planner. This handy program has two distinct advantages:
- Google is the most widely used Internet search engine, with a massive 92.86 percent market share according to the StatCounter website, which means your online marketing efforts should always focus predominantly on Google. It only makes sense to use one of the keyword research tools for SEO that Google operates, and which lets you pull data directly from Google.
- This keyword analysis tool is easy to use and incredibly comprehensive. You enter your keyword or text string to get a detailed list of results, comprising keyword suggestions with the number of hits they receive, and a suggested bid amount for your “pay-per-click” advertising model.
AdWords & SEO Keyword Permutation Generator
SEO is the way to make your marketing system as efficient and effective as possible. That’s much easier when you make use of the AdWords & SEO Keyword Permutation Generator, which lets you explore new combinations of keywords and keyword phrases quickly to help you reach new target markets. To get started, enter a series of keywords, and then let the generator create all of the different available permutations. You get instant keyword phrases, which means you don’t have to spend hours thinking them up yourself.
Keyword In is similar to the Keyword Permutation Generator, but it has a few additional functions and options for more keywords. Enter keywords into columns and click generate to let the application go to work combining them in hundreds of different ways. There are options to use from four to nine columns, and it’s possible to mark certain columns as optional to enhance and refine the results you get. This is an incredibly quick way to automate a time-consuming process, and also makes it easier to find keyword combinations that come together to form long-tail keywords you might not otherwise considered.
IMforSMB Bulk Keyword Generator
Do you have a small business? Are your keyword searches generating results that cast the net too wide for your friendly neighborhood store? It’s a common problem. Most keywords work on the basis you want to be as big as possible, but sometimes you want keywords that are more tailored to your business needs. The IMforSMB Bulk Keyword Generator lets you specify a business and location and gives results that are much more effective for generating local buzz.
You know how when you’re typing into Google’s search bar, and Google starts to make auto-complete suggestions to save you time? Well, that’s exactly what Soovle does. The simple application draws on various search engines to provide a comprehensive list of auto-complete suggestions for any given keyword. All you have to do is select which search engine you want to use and then enter your search text. Within seconds, you have a list of potential keywords to consider.
MORE FROM QUESTIONSANSWERED.NET
Effective Use of Statistics in Research – Methods and Tools for Data Analysis
Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.
Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.
Table of Contents
Role of Statistics in Biological Research
Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.
Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –
1. Establishing a Sample Size
Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.
2. Testing of Hypothesis
When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.
3. Data Interpretation Through Analysis
When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.
Types of Statistical Research Methods That Aid in Data Analysis
Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:
1. Descriptive Analysis
The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.
2. Inferential Analysis
The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.
3. Predictive Analysis
Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.
4. Prescriptive Analysis
Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.
5. Exploratory Data Analysis
EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.
6. Causal Analysis
Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.
7. Mechanistic Analysis
This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.
Important Statistical Tools In Research
Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.
1. Statistical Package for Social Science (SPSS)
It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.
2. R Foundation for Statistical Computing
This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.
3. MATLAB (The Mathworks)
It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.
4. Microsoft Excel
Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.
5. Statistical Analysis Software (SAS)
It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .
6. GraphPad Prism
It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.
This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.
Use of Statistical Tools In Research and Data Analysis
Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.
Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.
There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.
Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!
nice article to read
Holistic but delineating. A very good read.
Rate this article Cancel Reply
Your email address will not be published.
Enago Academy's Most Popular
- Language & Grammar
- Reporting Research
Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!
While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…
- Industry News
- Publishing News
2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!
It’s beginning to look a lot like success! Some of the greatest opportunities to research…
- Manuscript Preparation
- Publishing Research
Qualitative Vs. Quantitative Research — A step-wise guide to conduct research
A research study includes the collection and analysis of data. In quantitative research, the data…
Explanatory & Response Variable in Statistics — A quick guide for early career researchers!
Often researchers have a difficult time choosing the parameters and variables (like explanatory and response…
- Manuscripts & Grants
6 Tools to Create Flawless Presentations and Assignments
No matter how you look at it, presentations are vital to students’ success. It is…
2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…
Explanatory & Response Variable in Statistics — A quick guide for early career…
Sign-up to read more
Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:
- 2000+ blog articles
- 50+ Webinars
- 10+ Expert podcasts
- 50+ Infographics
- 10+ Checklists
- Research Guides
We hate spam too. We promise to protect your privacy and never spam you.
I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:
For what are you most likely to depend on AI-assistance?
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Account settings
- Advanced Search
- Journal List
- Indian J Anaesth
- v.60(9); 2016 Sep
Basic statistical tools in research and data analysis
Department of Anaesthesiology, Division of Neuroanaesthesiology, Sheri Kashmir Institute of Medical Sciences, Soura, Srinagar, Jammu and Kashmir, India
S Bala Bhaskar
1 Department of Anaesthesiology and Critical Care, Vijayanagar Institute of Medical Sciences, Bellary, Karnataka, India
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Statistics is a branch of science that deals with the collection, organisation, analysis of data and drawing of inferences from the samples to the whole population.[ 1 ] This requires a proper design of the study, an appropriate selection of the study sample and choice of a suitable statistical test. An adequate knowledge of statistics is necessary for proper designing of an epidemiological study or a clinical trial. Improper statistical methods may result in erroneous conclusions which may lead to unethical practice.[ 2 ]
Variable is a characteristic that varies from one individual member of population to another individual.[ 3 ] Variables such as height and weight are measured by some type of scale, convey quantitative information and are called as quantitative variables. Sex and eye colour give qualitative information and are called as qualitative variables[ 3 ] [ Figure 1 ].
Classification of variables
Quantitative or numerical data are subdivided into discrete and continuous measurements. Discrete numerical data are recorded as a whole number such as 0, 1, 2, 3,… (integer), whereas continuous data can assume any value. Observations that can be counted constitute the discrete data and observations that can be measured constitute the continuous data. Examples of discrete data are number of episodes of respiratory arrests or the number of re-intubations in an intensive care unit. Similarly, examples of continuous data are the serial serum glucose levels, partial pressure of oxygen in arterial blood and the oesophageal temperature.
A hierarchical scale of increasing precision can be used for observing and recording the data which is based on categorical, ordinal, interval and ratio scales [ Figure 1 ].
Categorical or nominal variables are unordered. The data are merely classified into categories and cannot be arranged in any particular order. If only two categories exist (as in gender male and female), it is called as a dichotomous (or binary) data. The various causes of re-intubation in an intensive care unit due to upper airway obstruction, impaired clearance of secretions, hypoxemia, hypercapnia, pulmonary oedema and neurological impairment are examples of categorical variables.
Ordinal variables have a clear ordering between the variables. However, the ordered data may not have equal intervals. Examples are the American Society of Anesthesiologists status or Richmond agitation-sedation scale.
Interval variables are similar to an ordinal variable, except that the intervals between the values of the interval variable are equally spaced. A good example of an interval scale is the Fahrenheit degree scale used to measure temperature. With the Fahrenheit scale, the difference between 70° and 75° is equal to the difference between 80° and 85°: The units of measurement are equal throughout the full range of the scale.
Ratio scales are similar to interval scales, in that equal differences between scale values have equal quantitative meaning. However, ratio scales also have a true zero point, which gives them an additional property. For example, the system of centimetres is an example of a ratio scale. There is a true zero point and the value of 0 cm means a complete absence of length. The thyromental distance of 6 cm in an adult may be twice that of a child in whom it may be 3 cm.
STATISTICS: DESCRIPTIVE AND INFERENTIAL STATISTICS
Descriptive statistics[ 4 ] try to describe the relationship between variables in a sample or population. Descriptive statistics provide a summary of data in the form of mean, median and mode. Inferential statistics[ 4 ] use a random sample of data taken from a population to describe and make inferences about the whole population. It is valuable when it is not possible to examine each member of an entire population. The examples if descriptive and inferential statistics are illustrated in Table 1 .
Example of descriptive and inferential statistics
The extent to which the observations cluster around a central location is described by the central tendency and the spread towards the extremes is described by the degree of dispersion.
Measures of central tendency
The measures of central tendency are mean, median and mode.[ 6 ] Mean (or the arithmetic average) is the sum of all the scores divided by the number of scores. Mean may be influenced profoundly by the extreme variables. For example, the average stay of organophosphorus poisoning patients in ICU may be influenced by a single patient who stays in ICU for around 5 months because of septicaemia. The extreme values are called outliers. The formula for the mean is
where x = each observation and n = number of observations. Median[ 6 ] is defined as the middle of a distribution in a ranked data (with half of the variables in the sample above and half below the median value) while mode is the most frequently occurring variable in a distribution. Range defines the spread, or variability, of a sample.[ 7 ] It is described by the minimum and maximum values of the variables. If we rank the data and after ranking, group the observations into percentiles, we can get better information of the pattern of spread of the variables. In percentiles, we rank the observations into 100 equal parts. We can then describe 25%, 50%, 75% or any other percentile amount. The median is the 50 th percentile. The interquartile range will be the observations in the middle 50% of the observations about the median (25 th -75 th percentile). Variance[ 7 ] is a measure of how spread out is the distribution. It gives an indication of how close an individual observation clusters about the mean value. The variance of a population is defined by the following formula:
where σ 2 is the population variance, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The variance of a sample is defined by slightly different formula:
where s 2 is the sample variance, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. The formula for the variance of a population has the value ‘ n ’ as the denominator. The expression ‘ n −1’ is known as the degrees of freedom and is one less than the number of parameters. Each observation is free to vary, except the last one which must be a defined value. The variance is measured in squared units. To make the interpretation of the data simple and to retain the basic unit of observation, the square root of variance is used. The square root of the variance is the standard deviation (SD).[ 8 ] The SD of a population is defined by the following formula:
where σ is the population SD, X is the population mean, X i is the i th element from the population and N is the number of elements in the population. The SD of a sample is defined by slightly different formula:
where s is the sample SD, x is the sample mean, x i is the i th element from the sample and n is the number of elements in the sample. An example for calculation of variation and SD is illustrated in Table 2 .
Example of mean, variance, standard deviation
Normal distribution or Gaussian distribution
Most of the biological variables usually cluster around a central value, with symmetrical positive and negative deviations about this point.[ 1 ] The standard normal distribution curve is a symmetrical bell-shaped. In a normal distribution curve, about 68% of the scores are within 1 SD of the mean. Around 95% of the scores are within 2 SDs of the mean and 99% within 3 SDs of the mean [ Figure 2 ].
Normal distribution curve
It is a distribution with an asymmetry of the variables about its mean. In a negatively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the right of Figure 1 . In a positively skewed distribution [ Figure 3 ], the mass of the distribution is concentrated on the left of the figure leading to a longer right tail.
Curves showing negatively skewed and positively skewed distribution
In inferential statistics, data are analysed from a sample to make inferences in the larger collection of the population. The purpose is to answer or test the hypotheses. A hypothesis (plural hypotheses) is a proposed explanation for a phenomenon. Hypothesis tests are thus procedures for making rational decisions about the reality of observed effects.
Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1 (where 0 indicates impossibility and 1 indicates certainty).
In inferential statistics, the term ‘null hypothesis’ ( H 0 ‘ H-naught ,’ ‘ H-null ’) denotes that there is no relationship (difference) between the population variables in question.[ 9 ]
Alternative hypothesis ( H 1 and H a ) denotes that a statement between the variables is expected to be true.[ 9 ]
The P value (or the calculated probability) is the probability of the event occurring by chance if the null hypothesis is true. The P value is a numerical between 0 and 1 and is interpreted by researchers in deciding whether to reject or retain the null hypothesis [ Table 3 ].
P values with interpretation
If P value is less than the arbitrarily chosen value (known as α or the significance level), the null hypothesis (H0) is rejected [ Table 4 ]. However, if null hypotheses (H0) is incorrectly rejected, this is known as a Type I error.[ 11 ] Further details regarding alpha error, beta error and sample size calculation and factors influencing them are dealt with in another section of this issue by Das S et al .[ 12 ]
Illustration for null hypothesis
PARAMETRIC AND NON-PARAMETRIC TESTS
Numerical data (quantitative variables) that are normally distributed are analysed with parametric tests.[ 13 ]
Two most basic prerequisites for parametric statistical analysis are:
- The assumption of normality which specifies that the means of the sample group are normally distributed
- The assumption of equal variance which specifies that the variances of the samples and of their corresponding population are equal.
However, if the distribution of the sample is skewed towards one side or the distribution is unknown due to the small sample size, non-parametric[ 14 ] statistical techniques are used. Non-parametric tests are used to analyse ordinal and categorical data.
The parametric tests assume that the data are on a quantitative (numerical) scale, with a normal distribution of the underlying population. The samples have the same variance (homogeneity of variances). The samples are randomly drawn from the population, and the observations within a group are independent of each other. The commonly used parametric tests are the Student's t -test, analysis of variance (ANOVA) and repeated measures ANOVA.
Student's t -test
Student's t -test is used to test the null hypothesis that there is no difference between the means of the two groups. It is used in three circumstances:
where X = sample mean, u = population mean and SE = standard error of mean
where X 1 − X 2 is the difference between the means of the two groups and SE denotes the standard error of the difference.
- To test if the population means estimated by two dependent samples differ significantly (the paired t -test). A usual setting for paired t -test is when measurements are made on the same subjects before and after a treatment.
The formula for paired t -test is:
where d is the mean difference and SE denotes the standard error of this difference.
The group variances can be compared using the F -test. The F -test is the ratio of variances (var l/var 2). If F differs significantly from 1.0, then it is concluded that the group variances differ significantly.
Analysis of variance
The Student's t -test cannot be used for comparison of three or more groups. The purpose of ANOVA is to test if there is any significant difference between the means of two or more groups.
In ANOVA, we study two variances – (a) between-group variability and (b) within-group variability. The within-group variability (error variance) is the variation that cannot be accounted for in the study design. It is based on random differences present in our samples.
However, the between-group (or effect variance) is the result of our treatment. These two estimates of variances are compared using the F-test.
A simplified formula for the F statistic is:
where MS b is the mean squares between the groups and MS w is the mean squares within groups.
Repeated measures analysis of variance
As with ANOVA, repeated measures ANOVA analyses the equality of means of three or more groups. However, a repeated measure ANOVA is used when all variables of a sample are measured under different conditions or at different points in time.
As the variables are measured from a sample at different points of time, the measurement of the dependent variable is repeated. Using a standard ANOVA in this case is not appropriate because it fails to model the correlation between the repeated measures: The data violate the ANOVA assumption of independence. Hence, in the measurement of repeated dependent variables, repeated measures ANOVA should be used.
When the assumptions of normality are not met, and the sample means are not normally, distributed parametric tests can lead to erroneous results. Non-parametric tests (distribution-free test) are used in such situation as they do not require the normality assumption.[ 15 ] Non-parametric tests may fail to detect a significant difference when compared with a parametric test. That is, they usually have less power.
As is done for the parametric tests, the test statistic is compared with known values for the sampling distribution of that statistic and the null hypothesis is accepted or rejected. The types of non-parametric analysis techniques and the corresponding parametric analysis techniques are delineated in Table 5 .
Analogue of parametric and non-parametric tests
Median test for one sample: The sign test and Wilcoxon's signed rank test
The sign test and Wilcoxon's signed rank test are used for median tests of one sample. These tests examine whether one instance of sample data is greater or smaller than the median reference value.
This test examines the hypothesis about the median θ0 of a population. It tests the null hypothesis H0 = θ0. When the observed value (Xi) is greater than the reference value (θ0), it is marked as+. If the observed value is smaller than the reference value, it is marked as − sign. If the observed value is equal to the reference value (θ0), it is eliminated from the sample.
If the null hypothesis is true, there will be an equal number of + signs and − signs.
The sign test ignores the actual values of the data and only uses + or − signs. Therefore, it is useful when it is difficult to measure the values.
Wilcoxon's signed rank test
There is a major limitation of sign test as we lose the quantitative information of the given data and merely use the + or – signs. Wilcoxon's signed rank test not only examines the observed values in comparison with θ0 but also takes into consideration the relative sizes, adding more statistical power to the test. As in the sign test, if there is an observed value that is equal to the reference value θ0, this observed value is eliminated from the sample.
Wilcoxon's rank sum test ranks all data points in order, calculates the rank sum of each sample and compares the difference in the rank sums.
It is used to test the null hypothesis that two samples have the same median or, alternatively, whether observations in one sample tend to be larger than observations in the other.
Mann–Whitney test compares all data (xi) belonging to the X group and all data (yi) belonging to the Y group and calculates the probability of xi being greater than yi: P (xi > yi). The null hypothesis states that P (xi > yi) = P (xi < yi) =1/2 while the alternative hypothesis states that P (xi > yi) ≠1/2.
The two-sample Kolmogorov-Smirnov (KS) test was designed as a generic method to test whether two random samples are drawn from the same distribution. The null hypothesis of the KS test is that both distributions are identical. The statistic of the KS test is a distance between the two empirical distributions, computed as the maximum absolute difference between their cumulative curves.
The Kruskal–Wallis test is a non-parametric test to analyse the variance.[ 14 ] It analyses if there is any difference in the median values of three or more independent samples. The data values are ranked in an increasing order, and the rank sums calculated followed by calculation of the test statistic.
In contrast to Kruskal–Wallis test, in Jonckheere test, there is an a priori ordering that gives it a more statistical power than the Kruskal–Wallis test.[ 14 ]
The Friedman test is a non-parametric test for testing the difference between several related samples. The Friedman test is an alternative for repeated measures ANOVAs which is used when the same parameter has been measured under different conditions on the same subjects.[ 13 ]
Tests to analyse the categorical data
Chi-square test, Fischer's exact test and McNemar's test are used to analyse the categorical or nominal variables. The Chi-square test compares the frequencies and tests whether the observed data differ significantly from that of the expected data if there were no differences between groups (i.e., the null hypothesis). It is calculated by the sum of the squared difference between observed ( O ) and the expected ( E ) data (or the deviation, d ) divided by the expected data by the following formula:
A Yates correction factor is used when the sample size is small. Fischer's exact test is used to determine if there are non-random associations between two categorical variables. It does not assume random sampling, and instead of referring a calculated statistic to a sampling distribution, it calculates an exact probability. McNemar's test is used for paired nominal data. It is applied to 2 × 2 table with paired-dependent samples. It is used to determine whether the row and column frequencies are equal (that is, whether there is ‘marginal homogeneity’). The null hypothesis is that the paired proportions are equal. The Mantel-Haenszel Chi-square test is a multivariate test as it analyses multiple grouping variables. It stratifies according to the nominated confounding variables and identifies any that affects the primary outcome variable. If the outcome variable is dichotomous, then logistic regression is used.
SOFTWARES AVAILABLE FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS
Numerous statistical software systems are available currently. The commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System ((SAS – developed by SAS Institute North Carolina, United States of America), R (designed by Ross Ihaka and Robert Gentleman from R core team), Minitab (developed by Minitab Inc), Stata (developed by StataCorp) and the MS Excel (developed by Microsoft).
There are a number of web resources which are related to statistical power analyses. A few are:
- StatPages.net – provides links to a number of online power calculators
- G-Power – provides a downloadable power analysis program that runs under DOS
- Power analysis for ANOVA designs an interactive site that calculates power or sample size needed to attain a given power for one effect in a factorial ANOVA design
- SPSS makes a program called SamplePower. It gives an output of a complete report on the computer screen which can be cut and paste into another document.
It is important that a researcher knows the concepts of the basic statistical methods used for conduct of a research study. This will help to conduct an appropriately well-designed study leading to valid and reliable results. Inappropriate use of statistical techniques may lead to faulty conclusions, inducing errors and undermining the significance of the article. Bad statistics may lead to bad research, and bad research may lead to unethical practice. Hence, an adequate knowledge of statistics and the appropriate use of statistical tests are important. An appropriate knowledge about the basic statistical methods will go a long way in improving the research designs and producing quality medical research which can be utilised for formulating the evidence-based guidelines.
Financial support and sponsorship
Conflicts of interest.
There are no conflicts of interest.
6.1 Introduction 6.2 Definitions 6.3 Basic Statistics 6.4 Statistical tests
6.2.1 Error 6.2.2 Accuracy 6.2.3 Precision 6.2.4 Bias
1. Random or unpredictable deviations between replicates, quantified with the "standard deviation". 2. Systematic or predictable regular deviation from the "true" value, quantified as "mean difference" (i.e. the difference between the true value and the mean of replicate determinations). 3. Constant, unrelated to the concentration of the substance analyzed (the analyte). 4. Proportional, i.e. related to the concentration of the analyte. * The "true" value of an attribute is by nature indeterminate and often has only a very relative meaning. Particularly in soil science for several attributes there is no such thing as the true value as any value obtained is method-dependent (e.g. cation exchange capacity). Obviously, this does not mean that no adequate analysis serving a purpose is possible. It does, however, emphasize the need for the establishment of standard reference methods and the importance of external QC (see Chapter 9).
The difference between the (mean) test result obtained from a number of laboratories using the same method and an accepted reference value. The method bias may depend on the analyte level.
The difference between the (mean) test result from a particular laboratory and the accepted reference value.
The difference between the mean of replicate test results of a sample and the ("true") value of the target population from which the sample was taken. In practice, for a laboratory this refers mainly to sample preparation, subsampling and weighing techniques. Whether a sample is representative for the population in the field is an extremely important aspect but usually falls outside the responsibility of the laboratory (in some cases laboratories have their own field sampling personnel).
6.3.1 Mean 6.3.2 Standard deviation 6.3.3 Relative standard deviation. Coefficient of variation 6.3.4 Confidence limits of a measurement 6.3.5 Propagation of errors
Note. When needed (e.g. for the F -test, see Eq. 6.11) the variance can, of course, be calculated by squaring the standard deviation:
m = "true" value (mean of large set of replicates) ¯x = mean of subsamples t = a statistical value which depends on the number of data and the required confidence (usually 95%). s = standard deviation of mean of subsamples n = number of subsamples
m = "true" value x = single measurement t = applicable t tab (Appendix 1) s = standard deviation of set of previous measurements.
Note: This "method-s" or s of a control sample is not a constant and may vary for different test materials, analyte levels, and with analytical conditions.
¯x = mean of duplicates s = known standard deviation of large set
126.96.36.199. Propagation of random errors 188.8.131.52 Propagation of systematic errors
a = ml HCl required for titration sample b = ml HCl required for titration blank s = air-dry sample weight in gram M = molarity of HCl 1.4 = 14×10 -3 ×100% (14 = atomic weight of N) mcf = moisture correction factor
distillation: 0.8%, titration: 0.5%, molarity: 0.2%, sample weight: 0.2%, mcf: 0.2%.
Note. Sample heterogeneity is also represented in the moisture correction factor. However, the influence of this factor on the final result is usually very small.
6.4.1 Two-sided vs. one-sided test 6.4.2 F-test for precision 6.4.3 t-Tests for bias 6.4.4 Linear correlation and regression 6.4.5 Analysis of variance (ANOVA)
- performance of two instruments, - performance of two methods, - performance of a procedure in different periods, - performance of two analysts or laboratories, - results obtained for a reference or control sample with the "true", "target" or "assigned" value of this sample.
1. are A and B different? (two-sided test) 2. is A higher (or lower) than B? (one-sided test).
df 1 = n 1 -1 df 2 = n 2 -1
184.108.40.206. Student's t-test 220.127.116.11 Cochran's t-test 18.104.22.168 t-Test for large data sets (n ³ 30) 22.214.171.124 Paired t-test
1. Student's t-test for comparison of two independent sets of data with very similar standard deviations; 2. the Cochran variant of the t -test when the standard deviations of the independent sets differ significantly; 3. the paired t- test for comparison of strongly dependent sets of data.
¯x = mean of test results of a sample m = "true" or reference value s = standard deviation of test results n = number of test results of the sample.
¯x 1 = mean of data set 1 ¯x 2 = mean of data set 2 s p = "pooled" standard deviation of the sets n 1 = number of data in set 1 n 2 = number of data in set 2.
s 1 = standard deviation of data set 1 s 2 = standard deviation of data set 2 n 1 = number of data in set 1 n 2 = number of data in set 2.
df = n 1 + n 2 - 2
Note. Another illustrative way to perform this test for bias is to calculate if the difference between the means falls within or outside the range where this difference is still not significantly large. In other words, if this difference is less than the least significant difference (lsd). This can be derived from Equation (6.13):
t 1 = t tab at n 1 -1 degrees of freedom t 2 = t tab at n 2 -1 degrees of freedom
= mean of differences within each pair of data s d = standard deviation of the mean of differences n = number of pairs of data
Note. Since such data sets do not have a normal distribution, the "normal" t -test which compares means of sets cannot be used here (the means do not constitute a fair representation of the sets). For the same reason no information about the precision of the two methods can be obtained, nor can the F -test be applied. For information about precision, replicate determinations are needed.
126.96.36.199 Construction of calibration graph 188.8.131.52 Comparing two sets of data using many samples at different analyte levels
1. When the concentration range is so wide that the errors, both random and systematic, are not independent (which is the assumption for the t -tests). This is often the case where concentration ranges of several magnitudes are involved. 2. When pairing is inappropriate for other reasons, notably a long time span between the two analyses (sample aging, change in laboratory conditions, etc.).
Note: Naturally, non-linear higher-order relationships are also possible, but since these are less common in analytical work and more complex to handle mathematically, they will not be discussed here. Nevertheless, to avoid misinterpretation, always inspect the kind of relationship by plotting the data, either on paper or on the computer monitor.
a = intercept of the line with the y-axis b = slope (tangent)
x i = data X ¯x = mean of data X y i = data Y ¯y = mean of data Y
r = 1 perfect positive linear correlation r = 0 no linear correlation (maybe other correlation) r = -1 perfect negative linear correlation
Note. A treatise of the error or uncertainty in the regression line is given.
- The most precise data set is plotted on the x-axis - At least 6, but preferably more than 10 different samples are analyzed - The samples should rather uniformly cover the analyte level range of interest.
Note. In the present example, the scattering of the points around the regression line does not seem to change much over the whole range. This indicates that the precision of laboratory Y does not change very much over the range with respect to laboratory X. This is not always the case. In such cases, weighted regression (not discussed here) is more appropriate than the unweighted regression as used here. Validation of a method (see Section 7.5) may reveal that precision can change significantly with the level of analyte (and with other factors such as sample matrix).
= "fitted" y-value for each x i , (read from graph or calculated with Eq. 6.22). Thus, is the (vertical) deviation of the found y-values from the line. n = number of calibration points. Note: Only the y-deviations of the points from the line are considered. It is assumed that deviations in the x-direction are negligible. This is, of course, only the case if the standards are very accurately prepared.
a = 0.037 ± 2.78 × 0.0132 = 0.037 ± 0.037 and b = 0.626 ± 2.78 × 0.0219 = 0.626 ± 0.061
- Download Your Software
- Software for Academia
- Software for Consumer Research
- Software for Human Factors R&D
- Request Live Demo
- See Pricing
- Contact Sales
We carry a range of biosensors from the top hardware producers. All compatible with iMotions
Learn how different fields use iMotions software to gain human behavior insights
iMotions target customers are those that benefit from access to emotional, cognitive and behavioral data in a synthesized way to tap into real-time human experiences
- Academia Human Behavior Research
- Consumer Insights Measure the drivers behind behavior
- Healthcare Optimize patient care
- Human Factors R&D Improve product and launch success
- Sensory and Perceptual Predict preferences and desires
- Work Environments, Training And Safety Create safer, more engaging work environments
Your Brain on Music
Studying Food Reward with Biometrics
Checkmate: the power of implicit factors in negotiation.
How can biosensors be used to address and improve mental well-being at work?
What is autobiographical memory?
- Copenhagen January 16, 2023
- Copenhagen March 27, 2023
- Singapore April 24, 2023
- Sydney May 1, 2023
5 days of training and certification in iMotions, data collection and analysis
Private workshops custom designed for the specific needs of your group
Part of the Customer Support Program for all new customers
- Customer Support Program
We ensure you get the most out of your investment with our Customer Support Program
We offer a complete Support package to all our customers. Find here the Support resources you need. Can’t find what you are looking for? Reach out to your Customer Success Manager who stands by to help.
The iMotions Community is a platform where all iMotions software users can share their research knowledge. Connect, collaborate, ask questions and engage with peers across the world.
- iMotions Lab
- iMotions Online
- iMotions Mobile BETA
- Eye Tracking
- Screen Based Eye Tracking
- VR Eye Tracking
- Eye Tracking Glasses
- FEA (Facial Expression Analysis)
- EDA/GSR (Electrodermal Activity)
- EEG (Electroencephalography)
- ECG (Electrocardiography)
- EMG (Electromyography)
- Cloud (ODC)
- Sensory and Perceptual
- Consumer Inights
- Human Factors R&D
- Work Environments, Training and Safety
- Customer Stories
- Document Library
- Help Center
- Release Notes
- Contact Support
- Mission Statement
- Ownership and Structure
- Executive Management
- Job Opportunities
Best Practice July 10, 2019
The Top 7 Statistical Tools You Need to Make Your Data Shine
We carry out research to test hypotheses, and we do that by getting hold of data. Hopefully, if our experiments are planned and executed correctly, we can get hold of good data that can tell us something unique about the world.
While the first part of any experiment – the planning and execution – is critically important, it is only half the battle. How the data is treated is just as important, and analyzing good data in the right way can lead to groundbreaking findings and insights.
Data analysis is often seen as the most scary aspect of completing research, but it doesn’t have to be that way. While you’ll need to understand what to do with the data, and how to interpret the results, software that is designed for statistical analysis can make this process as smooth and as easy as possible.
A great number of tools are available to carry out statistical analysis of data, and below we list (in no particular order) the seven best packages suitable for human behavior research.
Interested in Human Behavior and Psychology?
Sign up to our newsletter to get the latest articles and research send to you
1. SPSS (IBM)
SPSS , (Statistical Package for the Social Sciences) is perhaps the most widely used statistics software package within human behavior research. SPSS offers the ability to easily compile descriptive statistics, parametric and non-parametric analyses, as well as graphical depictions of results through the graphical user interface (GUI). It also includes the option to create scripts to automate analysis, or to carry out more advanced statistical processing.
2. R (R Foundation for Statistical Computing)
R is a free statistical software package that is widely used across both human behavior research and in other fields. Toolboxes (essentially plugins) are available for a great range of applications, which can simplify various aspects of data processing. While R is a very powerful software, it also has a steep learning curve, requiring a certain degree of coding. It does however come with an active community engaged in building and improving R and the associated plugins, which ensures that help is never too far away.
3. MATLAB (The Mathworks)
MatLab is an analytical platform and programming language that is widely used by engineers and scientists. As with R, the learning path is steep, and you will be required to create your own code at some point. A plentiful amount of toolboxes are also available to help answer your research questions (such as EEGLab for analysing EEG data). While MatLab can be difficult to use for novices, it offers a massive amount of flexibility in terms of what you want to do – as long as you can code it (or at least operate the toolbox you require).
4. Microsoft Excel
While not a cutting-edge solution for statistical analysis, MS Excel does offer a wide variety of tools for data visualization and simple statistics. It’s simple to generate summary metrics and customizable graphics and figures, making it a usable tool for many who want to see the basics of their data. As many individuals and companies both own and know how to use Excel, it also makes it an accessible option for those looking to get started with statistics.
5. SAS (Statistical Analysis Software)
SAS is a statistical analysis platform that offers options to use either the GUI, or to create scripts for more advanced analyses. It is a premium solution that is widely used in business, healthcare, and human behavior research alike. It’s possible to carry out advanced analyses and produce publication-worthy graphs and charts, although the coding can also be a difficult adjustment for those not used to this approach.
6. GraphPad Prism
GraphPad Prism is premium software primarily used within statistics related to biology, but offers a range of capabilities that can be used across various fields. Similar to SPSS, scripting options are available to automate analyses, or carry out more complex statistical calculations, but the majority of the work can be completed through the GUI.
The Minitab software offers a range of both basic and fairly advanced statistical tools for data analysis. Similar to GraphPad Prism, commands can be executed through both the GUI and scripted commands, making it accessible to novices as well as users looking to carry out more complex analyses.
There are a range of different software tools available, and each offers something slightly different to the user – what you choose will depend on a range of factors, including your research question, knowledge of statistics, and experience of coding.
These factors could mean that you are at the cutting-edge of data analysis, but as with any research, the quality of the data obtained is reliant upon the quality of the study execution . It’s therefore important to keep in mind that while you might have advanced statistical software (and the knowledge to use it) available to you, the results won’t mean much if they weren’t collected in a valid way.
Learn more: Statistical Analysis [Which Test To Use]
We’ve put together a guide to experimental design, helping you carry out quality research so that the results you collect can be relied on.
Free 44-page Experimental Design Guide
For Beginners and Intermediates
- Introduction to experimental methods
- Respondent management with groups and populations
- How to set up stimulus selection and arrangement
Interested in learning more?
You have many ways to reach out to us: Free virtual demos , expert advice, and support for all your questions.
See what is next in human behavior research
Follow our newsletter to get the latest insights and events send to your inbox.
Why are citations important in science?
How to perform a qualitative research interview
Pupillometry 101: What You Need to Know
How to test on your mobile phone
Can you measure love with Facial Expression Analysis?
Comparing eye-tracking methods for human factors research
Human Factors and UX
Can I use VR for biosensor research?
Explore Blog Categories
Product guides, product news, research fundamentals.
Read publications made possible with iMotions
Get inspired and learn more from our expert content writers
A monthly close up of latest product and research news
Have a language expert improve your writing
Run a free plagiarism check in 10 minutes, generate accurate citations for free.
- Knowledge Base
The Beginner's Guide to Statistical Analysis | 5 Steps & Examples
Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.
To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.
After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.
This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.
Table of contents
Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results.
To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.
Writing statistical hypotheses
The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.
A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.
While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.
- Null hypothesis: A 5-minute meditation exercise will have no effect on math test scores in teenagers.
- Alternative hypothesis: A 5-minute meditation exercise will improve math test scores in teenagers.
- Null hypothesis: Parental income and GPA have no relationship with each other in college students.
- Alternative hypothesis: Parental income and GPA are positively correlated in college students.
Planning your research design
A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.
First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.
- In an experimental design , you can assess a cause-and-effect relationship (e.g., the effect of meditation on test scores) using statistical tests of comparison or regression.
- In a correlational design , you can explore relationships between variables (e.g., parental income and GPA) without any assumption of causality using correlation coefficients and significance tests.
- In a descriptive design , you can study the characteristics of a population or phenomenon (e.g., the prevalence of anxiety in U.S. college students) using statistical tests to draw inferences from sample data.
Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.
- In a between-subjects design , you compare the group-level outcomes of participants who have been exposed to different treatments (e.g., those who performed a meditation exercise vs those who didn’t).
- In a within-subjects design , you compare repeated measures from participants who have participated in all treatments of a study (e.g., scores from before and after performing a meditation exercise).
- In a mixed (factorial) design , one variable is altered between subjects and another is altered within subjects (e.g., pretest and posttest scores from participants who either did or didn’t do a meditation exercise).
First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.
In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.
When planning a research design, you should operationalize your variables and decide exactly how you will measure them.
For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:
- Categorical data represents groupings. These may be nominal (e.g., gender) or ordinal (e.g. level of language ability).
- Quantitative data represents amounts. These may be on an interval scale (e.g. test score) or a ratio scale (e.g. age).
Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.
Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.
In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.
In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.
Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.
Sampling for statistical analysis
There are two main approaches to selecting a sample.
- Probability sampling: every member of the population has a chance of being selected for the study through random selection.
- Non-probability sampling: some members of the population are more likely than others to be selected for the study because of criteria such as convenience or voluntary self-selection.
In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.
But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.
If you want to use parametric tests for non-probability samples, you have to make the case that:
- your sample is representative of the population you’re generalizing your findings to.
- your sample lacks systematic bias.
Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.
If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .
Create an appropriate sampling procedure
Based on the resources available for your research, decide on how you’ll recruit participants.
- Will you have resources to advertise your study widely, including outside of your university setting?
- Will you have the means to recruit a diverse sample that represents a broad population?
- Do you have time to contact and follow up with members of hard-to-reach groups?
Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.
Calculate sufficient sample size
Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.
There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.
To use these calculators, you have to understand and input these key components:
- Significance level (alpha): the risk of rejecting a true null hypothesis that you are willing to take, usually set at 5%.
- Statistical power : the probability of your study detecting an effect of a certain size if there is one, usually 80% or higher.
- Expected effect size : a standardized indication of how large the expected result of your study will be, usually based on other similar studies.
- Population standard deviation: an estimate of the population parameter based on a previous study or a pilot study of your own.
Receive feedback on language, structure, and formatting
Professional editors proofread and edit your paper by focusing on:
- Academic style
- Vague sentences
- Style consistency
See an example
Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.
Inspect your data
There are various ways to inspect your data, including the following:
- Organizing data from each variable in frequency distribution tables .
- Displaying data from a key variable in a bar chart to view the distribution of responses.
- Visualizing the relationship between two variables using a scatter plot .
By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.
A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.
In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.
Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.
Calculate measures of central tendency
Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:
- Mode : the most popular response or value in the data set.
- Median : the value in the exact middle of the data set when ordered from low to high.
- Mean : the sum of all values divided by the number of values.
However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.
Calculate measures of variability
Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:
- Range : the highest value minus the lowest value of the data set.
- Interquartile range : the range of the middle half of the data set.
- Standard deviation : the average distance between each value in your data set and the mean.
- Variance : the square of the standard deviation.
Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.
Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.
From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.
It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.
A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.
Researchers often use two main methods (simultaneously) to make inferences in statistics.
- Estimation: calculating population parameters based on sample statistics.
- Hypothesis testing: a formal process for testing research predictions about the population using samples.
You can make two types of estimates of population parameters from sample statistics:
- A point estimate : a value that represents your best guess of the exact parameter.
- An interval estimate : a range of values that represent your best guess of where the parameter lies.
If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.
You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).
There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.
A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.
Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.
Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:
- A test statistic tells you how much your data differs from the null hypothesis of the test.
- A p value tells you the likelihood of obtaining your results if the null hypothesis is actually true in the population.
Statistical tests come in three main varieties:
- Comparison tests assess group differences in outcomes.
- Regression tests assess cause-and-effect relationships between variables.
- Correlation tests assess relationships between variables without assuming causation.
Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.
Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.
A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).
- A simple linear regression includes one predictor variable and one outcome variable.
- A multiple linear regression includes two or more predictor variables and one outcome variable.
Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.
- A t test is for exactly 1 or 2 groups when the sample is small (30 or less).
- A z test is for exactly 1 or 2 groups when the sample is large.
- An ANOVA is for 3 or more groups.
The z and t tests have subtypes based on the number and types of samples and the hypotheses:
- If you have only one sample that you want to compare to a population mean, use a one-sample test .
- If you have paired measurements (within-subjects design), use a dependent (paired) samples test .
- If you have completely separate measurements from two unmatched groups (between-subjects design), use an independent (unpaired) samples test .
- If you expect a difference between groups in a specific direction, use a one-tailed test .
- If you don’t have any expectations for the direction of a difference between groups, use a two-tailed test .
The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.
However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.
You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:
- a t value (test statistic) of 3.00
- a p value of 0.0028
Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.
A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:
- a t value of 3.08
- a p value of 0.001
The final step of statistical analysis is interpreting your results.
In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.
Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.
This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.
Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.
A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.
In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .
With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.
Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.
You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.
Frequentist versus Bayesian statistics
Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.
However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.
Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.
Is this article helpful?
Other students also liked.
- Descriptive Statistics | Definitions, Types, Examples
- Inferential Statistics | An Easy Introduction & Examples
- Choosing the Right Statistical Test | Types & Examples
More interesting articles
- Akaike Information Criterion | When & How to Use It (Example)
- An Easy Introduction to Statistical Significance (With Examples)
- An Introduction to t Tests | Definitions, Formula and Examples
- ANOVA in R | A Complete Step-by-Step Guide with Examples
- Central Limit Theorem | Formula, Definition & Examples
- Central Tendency | Understanding the Mean, Median & Mode
- Chi-Square (Χ²) Distributions | Definition & Examples
- Chi-Square (Χ²) Table | Examples & Downloadable Table
- Chi-Square (Χ²) Tests | Types, Formula & Examples
- Chi-Square Goodness of Fit Test | Formula, Guide & Examples
- Chi-Square Test of Independence | Formula, Guide & Examples
- Coefficient of Determination (R²) | Calculation & Interpretation
- Correlation Coefficient | Types, Formulas & Examples
- Frequency Distribution | Tables, Types & Examples
- How to Calculate Standard Deviation (Guide) | Calculator & Examples
- How to Calculate Variance | Calculator, Analysis & Examples
- How to Find Degrees of Freedom | Definition & Formula
- How to Find Interquartile Range (IQR) | Calculator & Examples
- How to Find Outliers | 4 Ways with Examples & Explanation
- How to Find the Geometric Mean | Calculator & Formula
- How to Find the Mean | Definition, Examples & Calculator
- How to Find the Median | Definition, Examples & Calculator
- How to Find the Mode | Definition, Examples & Calculator
- How to Find the Range of a Data Set | Calculator & Formula
- Hypothesis Testing | A Step-by-Step Guide with Easy Examples
- Interval Data and How to Analyze It | Definitions & Examples
- Levels of Measurement | Nominal, Ordinal, Interval and Ratio
- Linear Regression in R | A Step-by-Step Guide & Examples
- Missing Data | Types, Explanation, & Imputation
- Multiple Linear Regression | A Quick Guide (Examples)
- Nominal Data | Definition, Examples, Data Collection & Analysis
- Normal Distribution | Examples, Formulas, & Uses
- Null and Alternative Hypotheses | Definitions & Examples
- One-way ANOVA | When and How to Use It (With Examples)
- Ordinal Data | Definition, Examples, Data Collection & Analysis
- Parameter vs Statistic | Definitions, Differences & Examples
- Pearson Correlation Coefficient (r) | Guide & Examples
- Poisson Distributions | Definition, Formula & Examples
- Probability Distribution | Formula, Types, & Examples
- Quartiles & Quantiles | Calculation, Definition & Interpretation
- Ratio Scales | Definition, Examples, & Data Analysis
- Simple Linear Regression | An Easy Introduction & Examples
- Skewness | Definition, Examples & Formula
- Statistical Power and Why It Matters | A Simple Introduction
- Student's t Table (Free Download) | Guide & Examples
- T-distribution: What it is and how to use it
- Test statistics | Definition, Interpretation, and Examples
- The Standard Normal Distribution | Calculator, Examples & Uses
- Two-Way ANOVA | Examples & When To Use It
- Type I & Type II Errors | Differences, Examples, Visualizations
- Understanding Confidence Intervals | Easy Examples & Formulas
- Understanding P values | Definition and Examples
- Variability | Calculating Range, IQR, Variance, Standard Deviation
- What is Effect Size and Why Does It Matter? (Examples)
- What Is Kurtosis? | Definition, Examples & Formula
- What Is Standard Error? | How to Calculate (Guide with Examples)
What is your plagiarism score?
Standard statistical tools in research and data analysis
Statistics is a field of science concerned with gathering, organising, analysing, and extrapolating data from samples to the entire population. This necessitates a well-designed study, a well-chosen study sample, and a proper statistical test selection. A good understanding of statistics is required to design epidemiological research or a clinical trial. Improper statistical approaches might lead to erroneous findings and unethical behaviour.
A variable is a trait that differs from one person to the next within a population. Quantitative variables are measured by a scale and provide quantitative information, such as height and weight. Qualitative factors, such as sex and eye colour, provide qualitative information (Figure 1).
Figure 1. Classification of variables 
Discrete and continuous measures are used to split quantitative or numerical data. Continuous data can take on any value, whereas discrete numerical data is stored as a whole number such as 0, 1, 2, 3,… (integer). Discrete data is made up of countable observations, while continuous data is made up of measurable observations. Discrete data examples include the number of respiratory arrest episodes or re-intubation in an intensive care unit. Continuous data includes serial serum glucose levels, partial pressure of oxygen in arterial blood, and oesophageal temperature. A hierarchical scale with increasing precision can be used based on category, ordinal, interval and ratio scales (Figure 1).
Descriptive statistics try to explain how variables in a sample or population are related. The mean, median, and mode forms, descriptive statistics give an overview of data. Inferential statistics use a random sample of data from that group to characterise and infer about a community as a whole. It’s useful when it’s not possible to investigate every single person in a group.
The central tendency describes how observations cluster about a centre point, whereas the degree of dispersion describes the spread towards the extremes.
In inferential statistics, data from a sample is analysed to conclude the entire population. The goal is to prove or disprove the theories. A hypothesis is a suggested explanation for a phenomenon (plural hypotheses). Hypothesis testing is essential to process for making logical choices regarding observed effects’ veracity.
SOFTWARES FOR STATISTICS, SAMPLE SIZE CALCULATION AND POWER ANALYSIS
There are several statistical software packages accessible today. The most commonly used software systems are Statistical Package for the Social Sciences (SPSS – manufactured by IBM corporation), Statistical Analysis System (SAS – developed by SAS Institute North Carolina, Minitab (developed by Minitab Inc), United States of America), R (designed by Ross Ihaka and Robert Gentleman from the R core team), Stata (developed by StataCorp), and MS Excel. There are several websites linked to statistical power studies. Here are a few examples:
- StatPages.net – contains connections to a variety of online power calculators.
- G-Power — a downloadable power analysis software that works on DOS.
- ANOVA power analysis creates an interactive webpage that estimates the power or sample size required to achieve a specified power for one effect in a factorial ANOVA design.
- Sample Power is software created by SPSS. It generates a comprehensive report on the computer screen that may be copied and pasted into another document.
A researcher must be familiar with the most important statistical approaches for doing research. This will aid in the implementation of a well-designed study that yields accurate and valid data. Incorrect statistical approaches can result in erroneous findings, mistakes, and reduced paper’s importance. Poor statistics can lead to poor research, which can lead to immoral behaviour. As a result, proper statistical understanding and the right application of statistical tests are essential. A thorough understanding of fundamental statistical methods will go a long way toward enhancing study designs and creating high-quality medical research that may be used to develop evidence-based guidelines.
 Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.” Indian journal of anaesthesia vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623
 Ali, Zulfiqar, and S Bala Bhaskar. “Basic statistical tools in research and data analysis.” Indian journal of anaesthesia vol. 60,9 (2016): 662-669. doi:10.4103/0019-5049.190623
- ANOVA power analysis
- Quantitative Data analysis
- quantitative variables
- R programming
- sample size calculation.
- A global market analysis (1)
- Academic (22)
- Algorithms (1)
- Big Data Analytics (4)
- Bio Statistics (3)
- Clinical Prediction Model (1)
- Corporate (9)
- Corporate statistics service (1)
- Data Analyses (20)
- Data collection (11)
- Genomics & Bioinformatics (1)
- Guidelines (2)
- Machine Learning (1)
- Meta-analysis service (2)
- Network Analysis (1)
- Predictive analyses (1)
- Qualitative (1)
- Quantitaive (2)
- Quantitative Data analysis service (1)
- Research (59)
- Shipping & Logistics (1)
- Statistical analysis service (7)
- Statistical models (1)
- Statistical Report Writing (1)
- Statistical Software (10)
- Statistics (64)
- Survey & Interview from Statswork (1)
- Uncategorized (1)
- Application of machine learning in marketing
- Quantitative analysis of imaging data
- Bioinformatics analysis and identification of potential genes related to pathogenesis
- Qualitative designs: using words to provide evidence
- What are the different tools in data-driven marketing?
Statswork is a pioneer statistical consulting company providing full assistance to researchers and scholars. Statswork offers expert consulting assistance and enhancing researchers by our distinct statistical process and communication throughout the research process with us.
– Research Planning – Tool Development – Data Mining – Data Collection – Statistics Coursework – Research Methodology – Meta Analysis – Data Analysis
- – Corporate
- – Statistical Software
- – Statistics
#10, Kutty Street, Nungambakkam, Chennai, Tamil Nadu – 600034, India No : +91 4433182000, UK No : +44-1223926607 , US No : +1-9725029262 Email: [email protected]
© 2021 Statswork. All Rights Reserved
Statistical treatment in a thesis is a way of removing researcher bias by interpreting the data statistically rather than subjectively. Giving a thesis statistical treatment also ensures that all necessary data has been collected.
If you want your business to make it to the top, search engine optimization (SEO) and, in particular, identifying keywords to guide the type of content you publish is an essential component of your online marketing strategy. Using the best ...
Educational assessment tools include rubrics, graphic organizers, portfolios, feedback tools and discussion tools. Some educational tools help schools evaluate student performance, while others permit self evaluations of student performance...
Important Statistical Tools In Research · 1. Statistical Package for Social Science (SPSS) · 2. R Foundation for Statistical Computing · 3. MATLAB
Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and
Some of the most common and convenient statistical tools to quantify such comparisons are the F-test, the t-tests, and regression analysis. Because the F-test
This video enumerates the important statistical tools for research and its purpose.
1. SPSS (IBM) · 2. R (R Foundation for Statistical Computing) · 3. MATLAB (The Mathworks) · 4. Microsoft Excel · 5. SAS (Statistical Analysis
study on Statistical tools used in research studies. Keywords: quantify accuracy, analytical procedures, quality assurance, data analysis tools.
Step 1: Write your hypotheses and plan your research design · Step 2: Collect data from a sample · Step 3: Summarize your data with descriptive statistics · Step 4
The mean, median, and mode forms, descriptive statistics give an overview of data. Inferential statistics use a random sample of data from that
For this analysis, there are five to choose from: mean, standard deviation, regression, hypothesis testing, and sample size determination. The 5
A statistical tool helps researchers and marketers organise quantitative data gathered from experiments, focus groups, interviews
If you're just designing a study to find out something specific, you might use a tool like sample size determination to help you decide how many