The first impression or thoughts regarding data analysis will almost certainly land on the assumption that you’re mainly dealing with numerical data. This is normal, as the outputs are almost always going to be statistical, but when we’re dealing with real world problems, it’s rare that you have only numerical data to work with. And this is why it’s important that we recognise categorical data alongside numerical data.
What exactly is categorical data then? Well, it’s not numeric for a start! What we mean by this is that the answers or responses to describe the category are not numeric. Consider this, what colours of hair would you get? It might be black, blonde, brunette or grey. So these are perfect examples of categorical data.
It goes deeper than this too. We can describe categorical data further, classifying it into nominal, ordinal and binary for example.
Nominal data are categories that have no specific order. There are no priorities within them, they’re equal and you probably end up sorting them alphabetically. The hair colour example would be a nominal set of data.
Ordinal data do have some form of hierarchy. Think about the grade results at university, with first class and second class degrees. It may also be a satisfaction rating system, using natural language such as “very good” and “very bad” to describe the results.
Binary data will only have two category answers. A person’s gender would typically be a good example of this, as would situations where a yes/no response is required.
Numerical data is possibly a bit more familiar to most people. It represents the results in a series of numbers, and these can usually be described as continuous or discrete. Continuous data can be any value within a range. Think about your height or weight. The specific value of either could go on for many decimal points. It can go on without any real breaks in the data, and this is why it’s called continous.
Discrete data, however, can only be in the form of specific values. It might be the number of items on a shelf. They can only be whole, it won’t be possible to have half an item involved. As such, because it cannot be broken down further, we call it discrete data.
It’s important that you’re able to understand your data and how to describe it. This is because the way in which you analyse it will likely differ depending on what you’re dealing with. Of course, it will depend on what you want to do, but how you display the data in a concise way will be different. With categorical data, you may begin with looking at the frequency of each response, whereas with numerical data, it may make more sense to look at the mean, median and standard deviation.
So as you continue on your task to analyse data, begin first with knowing your data and understanding how to describe it.