When analyzing your survey, there are nearly infinite ways to look at your data. In our last article, we reviewed the types of data you may encounter. Now, we will examine the measurement of those data types.
It may be helpful to think about different types of data in terms of how they can be measured to determine central tendencies of each. Measures of central tendency (mode measurement, median and mean measurement) attempt to describe a group of data with a single point. Over the next few articles, we’ll discuss each form of measurement, beginning with Mode.
Mode Measurement: Data Types: Nominal, Categorical, Ordinal and Ratio
Example: Nominal/Categorical Data Collection
For traditional multiple choice, single answer questions, means can’t be calculated. The appropriate measure of central tendency for categorical data is the mode.
Mode is usually thought of as the most-basic measure of central tendency. It is defined as the point, or answer, that occurs most often in a dataset. Determining which item occurs most often, requires counting the number of occurrences of each answer within a question. The easiest way to visualize this is to create a frequency table like the example below. This is simply a count of the number of times each topping was chosen by survey respondents.
For this dataset, pepperoni is the mode because it is the topping that people chose most frequently. It is possible to have multiple modes. If there are two items that occur more frequently than the other items (bi-modality), there is not a single point or item that can be used to describe the dataset. In that case, reporting two items is appropriate.
That said, you may also want to ask if the distribution of answers is different from a distribution where there is no true preference, or only due to chance or random answers by participants.
To achieve an even deeper understanding of whether it is appropriate to report the mode as a descriptor of the dataset, the Chi-Square (2) Goodness of Fit Test may be used to determine if the frequencies found differ statistically from what would be expected due to chance. It wouldn’t make just sense to say that pepperoni was the clear winner in topping preference if the number of respondents choosing that topping was only slightly more than 1/5th of the total respondents.
For example, if respondents were answering the previous question randomly, you’d expect each topping to have a frequency of roughly 20%. At 20%, there’s no difference from chance, or random answers, which means that the type of pizza topping had no effect on the outcome. The question is, how much above 20% does the frequency have to be to safely say that one of the toppings is preferred? In this case, Chi-Square (2) Goodness of Fit Test could be tested against an expected value of 20%, or 200 in for dataset above. If the test is significant (p<.05), then the differences are most likely due to preference differences. You may see this written as 2= (4, N=400) = 50.2, p<.05 (Laerd Statistics).
Now that we’ve covered Mode Measurement, check back with us later this week to learn about Median Measurement.