## yukitou 3 years ago How to use statistics for hypothesis testing? How should I know when to use these different statistic tests on different experiment?

1. TranceNova

Hmmm a bit of a hard question really and I suspect blues will give a better answer than I will, but here goes. What do you know about hypothesis testing? Or rather what have you learned so far? Basically (the way I see it), is if you need to see if something has a different effect than something else you need to hypothesis test. In otherwords, if you want to know if there is an effect from a treatment you have applied and you want to know if the difference you see is because of your treatment or just random chance you need to use some form of hypothesis test. Which hypothesis test? This is all in treatment design and the data that you have.

2. blues

It would help if you gave us more specific info on which tests you have questions about. But yes, as a computational biologist, this stuff I do often and well. As you probably know, all hypothesis tests are an exercise in inferential statistics. It would be too costly or logistically impossible to compute a certain statistic - like the mean or standard deviation - of an entire population. So instead we take a sample from that population and do a hypothesis test to determine how sure we are, based on that sample, that the *population* statistic is a certain thing. To do this, all hypothesis tests have the same basic format: we have a null hypothesis and an alternative hypothesis. We then compute a test statistic based on the sample. If the test statistic falls in a certain acceptance range, "We do not reject the null hypothesis." If it falls outside that range, "We reject the null hypothesis in favor of the alternative hypothesis." There are quite a few different hypothesis tests for different situations. Some of those depend on different descriptive stats - like mean, standard deviation, coefficients in a fit model, etc. Others depend on the structure of the data being tested - for example, is the data paired or unpaired? Is it normally distributed or can it assumed to be normally distributed? There are other, less known tests for data with other distributions. Or, if you can't make an assumption about the distribution of the data, then there are so called "non parametric" or "rank" statistical tests which you can use to test it. It also depends to some extent what discipline you are working in. Specific fields in biology have their own accepted sets of statistic tests. For example, in ecology they like the Shapiro Wilkes test to see if the residuals about a fitted line are normally distributed. In climatology, they use the Durbin Watson test for auto-correlation to do pretty much the same thing...

3. yukitou

During A-levels, I only learn about some simple formulas for Chi square test, student t test, mann whitney u test. And I didn't think about the purpose of these statistics test. Now in university level, I need to learn when to use these statistic test. Sometimes I wonder in which situation I can only use chi-square test, student t test and other stat test. I also wondering on how to know degree of freedom and p-value. Statistic makes me confuse. I prefer doing differentiation than stat questions.

4. blues

You use the different tests in different places - different types of data, if you are interested in the probability distribution on the mean, etc, as discussed above - and it is just a matter of remembering when to use each one. For example, the chi-square tests are used to test goodness of fit, as in whether a certain expected categorical distribution is similar to what you actually observe. Like if you are doing a genetics cross and you expect to get 1/4 red plants, 1/2 pink plants and 1/4 white plants as offspring, but what you actually get is 1/5 red, 3/5 pink 1/5 red - then how certain are you that what you have observed has the same distribution as the expected? Also, degrees of freedom depend on the test you are doing. They are usually inherent on the dimensions of the data you have sampled. Many tests involve n - 1 degrees of freedom, so if you have a sample of 20 observations, then you have 19 degrees of freedom. And P values are important. Very important. Go back to the null and alternative hypothesis. The P value is the probability of obtaining whatever test statistic at least as extreme as the one you get, *if the null hypothesis is true*. So if you get a P value of 0.000001, then there is a very small probability, given the data you sampled, that the null hypothesis is true. You should reject the null hypothesis. But on the other hand, if you get a P value = 0.65, then the null hypothesis probably is true and you should not reject it. It depends on your field, but for general purposes you should think about rejecting the null hypothesis if P = 0.05 and certainly if P = 0.01.

5. yukitou

what is goodness of fit? which statistic test to use if i have one independent variable? what about if i have more than one variable?

6. blues

There are hundreds of statistic tests and I am not going to run through all of them with you. I gave you a bare bones description of the chi square goodness of fit test above. I do suggest that you take stats seriously if you intend to go on in biology - you will rely on them for the rest of your career. There is a statistics program - actually, a computer programming language which most biologists feel is particularly well adapted to statistics work - called R and I suggest you look into it. A lot of the online resources and tutorials on basic statistics use it; if you make a career in biology you will also use it - so it is best to suck it up and learn it from the get go. Despite the steep learning curve. Trance and a couple other members of the group are also making a study of R and I am happy to answer any questions on it posted in Biology. Even though it is technically a tool rather than a basic biologic concept. Best of luck.