Before you start…

Before starting this exercise, you should have completed all the relevant Absolute Beginners’, Part 1 worksheets. Each section below indicates which of the earlier worksheets are relevant.

Getting the data into R

Relevant worksheet: Intro to RStudio

In this excercise, you’ll be analysing some data that you and your peers recently collected. To get this data into R, follow these steps:

  1. Open a RStudio project for this analysis and, within that, create a script file.

  2. Upload the CSV file you have been given for this activity into your RStudio project folder. If you want to try out this worksheet without that data file, you can use this example CSV file instead. You can only complete your PsycEL activity if you use the CSV file you were sent.

  3. Load the tidyverse package, and then load your data into R.

# Load tidyverse
library(tidyverse)
# Load data
data <- read_csv("green.csv")

Note: In the example above, you’ll need to replace green.csv with the name of the CSV file you just uploaded into your RStudio project.

Inspect

Look at the data by clicking on it in the Environment tab in RStudio.

Each row is a rating by one participant in this study of creativity. Groups of participants came up with a creative solution to a problem, while either taking a walk in an urban environment or a nature environment. Each of these solutions has been rated for creativity by a set of raters.

Will the nature environment lead to more creative ideas than the urban environment?

Column Description Values
Solution Number of the Solution a number
Rater Reference number of the person rating the solution a number
Cond Which environment was the creator in? “Urban”, “Nature”
score How creative was the idea rated to be? 0-100, higher numbers = more creative

Pre-processing

Relevant worksheet: Group Differences

We start by “pre-processing” our data, in order to make it easier to analyse. We do this in two steps:

1. Cleaning the data set

In some cases, a participant did not provide a rating of a solution – this is then represented in the dataset as NA. R uses NA to specify that this data point is missing – in this case, because the participant didn’t respond.

Although it’s good to explicitly record that a response was not made, keeping these NA in the dataset will cause problems later on, so we’re going to remove them:

# Remove NAs
data  <- data %>% drop_na(score)

The command drop_na(score) is new – it just means remove the rows of the dataset where the score is recorded as NA. The rest of the command uses things we covered in the Group Differences worksheet – the dataframe data is sent (i.e., piped, %>%) to the drop_na() command, which removes the NA, and the results are stored (<-) back in the data dataframe.

2. Summarising

Each solution was rated by several people. We’re going to take the average (mean) of those ratings, so we’re left with one creativity score per solution. We use the group_by, summarise, and mean commands we used in the Group Differences worksheet to do this:

# Group by 'Cond' and 'Solution', calculate mean score; place results into 'creative'
creative <- data %>% group_by(Cond, Solution) %>% summarise(score = mean(score))
`summarise()` has grouped output by 'Cond'. You can override using the
`.groups` argument.

As before, you can safely ignore the “ungrouping” message that you receive.

If we look at this summarized data, by clicking on the Environment tab of RStudio, we can see that we now have one creativity score per solution.

Creativity and the environment

Relevant worksheets: Group Differences, Evidence

We start by looking to see how the mean creativity scores differ for those who were in a nature or an urban environment. We can do this using the group_by and summarise functions in a similar way to before, but on our preprocessed data, which we have stored in the data frame creative:

# Group by 'Cond', calculate mean score.
creative %>% group_by(Cond) %>% summarise(mean(score))
# A tibble: 2 × 2
  Cond   `mean(score)`
  <chr>          <dbl>
1 Nature          42.7
2 Urban           39.2

Your output will look similar to this, but the numbers will probably be different. In this example, it looks like there’s a small difference, with the creativity ratings slightly higher in the Nature environment – but how does this between-group difference compare to the within-group variability? As we covered in the Group Differences worksheet, this is most easily looked at with a scaled density plot:

# Display density plots of 'score', by 'Cond'
creative %>% ggplot(aes(score, colour = factor(Cond))) + geom_density(aes(y = ..scaled..)) + xlim(0, 100)

Explanation of command: The only new part here is xlim(0, 100), which sets limits on the x-axis of your graph. Specifically, it forces the lowest value on the x-axis to be 0 and the highest value to be 100. Without xlim, R chooses limits that it thinks are sensible. Like all computer programs, R isn’t that bright, so often it makes sense to tell it more precisely what you want.

In this example, the graph tells a somewhat different story to the means - although a difference between groups is visible, it is small compared to the variability within each group.

We can express the size of the difference in means, relative to the within-group variability, as an effect size. As we said in the Group Differences worksheet, we calculate an effect size in R like this:

# Load a package that calculates effect sizes
library(effsize)
# Calculate Cohen's d for the effect of 'Cond' on 'score'
cohen.d(creative$score ~ creative$Cond)

Cohen's d

d estimate: 0.2145072 (small)
95 percent confidence interval:
     lower      upper 
-0.5973451  1.0263596 

In this example, the effect size is around 0.21, which is typically described as a small effect. The effect size for your data may be different.

At this point, the most pressing question is probably whether the difference observed in the mean scores is likely to be real, or whether it’s more likely down to chance. As we saw in the Evidence worksheet, the best way to look at this is with a Bayesian t-test:

# Load BayesFactor package
library(BayesFactor, quietly = TRUE)
# Calculate Bayesian t-test for effect of 'Cond' on 'score'
ttestBF(formula = score ~ Cond, data = data.frame(creative))
Bayes factor analysis
--------------
[1] Alt., r=0.707 : 0.4053154 ±0%

Against denominator:
  Null, mu1-mu2 = 0 
---
Bayes factor type: BFindepSample, JZS

The Bayes Factor in this case is approximately a 1/2 (0.41 to be more precise), meaning it’s about twice as likely there isn’t a difference as there is. Your number will likely be a bit different.

Enter the mean creativity score for each condition, the effect size, and the Bayes Factor for the difference, into PsycEL.

Using the convention that there is a difference if BF > 3, there isn’t a difference if BF < 0.33, and if it’s between 0.33 and 3, we’re unsure, select difference, no difference, or unsure, on PsycEL.


This material is distributed under a Creative Commons licence. CC-BY-SA 4.0.