Running analyses

Introduction

We're now going to use statscloud to run some analyses.

Running analyses in statscloud is much easier than it is in a lot of other statistics packages. This is because, in other stats packages, you're expected to trawl through dozens of options and tick specific boxes to get the results you want and be able to interpret everything in the output correctly with very little context. This can understandably be quite overwhelming for students who are still quite new to statistics so, in StatsCloud, running analyses is much simpler.

To run an analysis in StatsCloud, all you need to do is select the variables you want in your test. Once you've done this, statscloud can select the most appropriate test for you and display your results in an output you can explore and interactive with. The best way of showing this is to actually experience it yourself, so let's give it a go.

Running an analyses

We'll jump straight in with our Marvel movies data set. Here, we have a list of movies from each of the four phases in the Marvel Cinematic universe, along with their IMDb ratings and Worldwide Gross statistics. One thing we may want to look at here is whether the average IMDb rating is significantly different across the four phases.

To do this, just click on the "Analyes" tab and Run your first analysis. From here you'll see a list of tests we can perform. You'll notice that a few of them are 'greyed out' (because statscloud filters out all the analyses we can't do with our current data set). All the tests we can do are here, and the one we actually want to do is the One-way ANOVA (Independent), but let's pretend we don't know that. Let's pretend that all we know is that we want to compare differences between some groups. We do that by simply clicking on the column heading "Differences" and, when we do, we're able to select our variables for the test.

To run the analysis, all we need to do is select the 'Grouping variable' and the 'Measured variable'. You'll notice here that statscloud only includes the variables we can pick in each box. So, in the 'Grouping variable' box, we only see our (one) categorical variable (Phase) and, in the 'Measured variable', we see our other two, numeric variables. This makes selecting variables for our test really simple. To see how IMDb ratings differ by Phase, we simply select Phase as our grouping variable and IMDb rating as our measured variable. That's it!

Task: Run an analysis comparing IMDb rating by Phase

Open the Marvel movied data set by clicking here

Follow the instructions above to set up a test that measures IMDb rating by Phase. When you've done that, just click Run analysis.

Interpreting the output

After clicking 'Run Analysis', you'll see the results of the analysis appear on screen. The first thing you may spot here is that statscloud has chosen the right test for us; the One-way ANOVA. As well as telling us what test we've just run, statscloud gives us an overview of our whole analysis; it gives us some highlights (whether the test was significant or reliable), provides a visual representation of the data through a chart, summarises the test results for us in a table, and provides some additional details on the test's reliability.

Task: Familiarise yourself with the output.

Take a minute or two to scroll through the output and see if you can understand what is being shown to you in each section.

Viewing charts

One of the first things in our output is a chart showing us a visual representation of our data; the average IMDb rating for each phase. By clicking the "Options" button (the second button down), we can change which type of chart we want to see here. To view a larger version of the chart (which is handy if you're using a touch-device), you can click the "full-screen" button below this.

Task: Take a closer look at the chart

Try changing the type of chart to 'violin', and then view the chart in full-screen to get a closer look.

Test results

The next part of our output are the actual test results. This is quite similar to the output you'll find in other statistics package, but with a key difference; this too is interactive. Not only does it provide a summary of the key statistics for you, it allows you to view the formula used to calculate each statistic. If you click on the first 'mean' score (of 7.267), you'll notice a pop-up will show you the formula statscloud used to calculate it. This is true of almost any statistic in the output table; you can click on just about any value in this table to see how it was calculated.

Task: View the formulas for some key statistics.

In this output table, click on a few of the cells to view the formula for them. Start with the mean and standard deviation and then look at the formula for the F value!

In this output table, click on a few of the cells to view the formula for them. Start with the mean and standard deviation and then look at the formula for the F value!

Reliability

Often, when running analyses in other statistics package, you need to request that specific reliability tests are calculated and run each of these separately. However, in StatsCloud, these reliability tests are all done for you automatically and are summarised for you in this section. Each test has it's own test assumptions and these vary according to the test we run. For instance, for this test, one assumption we have is for normality (i.e. that the data is normally distributed). If our data is not normally distributed, our results could be unreliable so we'll need to run another test.

If you want to check whether the normality assumption was met, just click / tap on the Normality button and take a look at the results. In this case, we can see that statscloud has run a normality test (the Shapiro-Wilk test) for us automatically on the data we've used in this analysis and has summarised the results of those here for us. In this case, we can see that all four phrases have a green tick by them. Each normality dialog box explains how the reliability test was performed and what the criteria for success was. In this case, we can see that we need the values in the 'p' column to be above 0.05, and they all are. Because of this, statscloud has placed a green tick by each of them, meaning that the assumptions of this test have been met.

At the bottom of this dialog box, you have the option to state whether the assumptions of this test have been met, with three options ("yes", "no", or "unclear"). When you close this dialog, the box becomes coloured (with either green, red, or amber) to depict the reliablity status of this assumption. When all of them have been rated, the flag next to the analysis becomes coloured (green if everything is OK, but red or amber if any assumptions have been lit in that colour).

Task: View the normality result for this analysis.

Click on the Normality button to preview the normality results for this analysis. Verify that all the groups of data we have used in this data are normally distributed and, if they are, click 'yes' under the "Has this assumption been met?" box.

Note: statscloud can test the majority of these assumptions for you. If you click on the 'Options' button to the right of the 'Reliability' box, you'll see an option to select "Test assumptions automatically". When you do this, statscloud will automatically test these and assign a colour for you.

However, note that there are some assumptions that cannot be tested automatically; some tests (like in Linear Regression) require you to examine a plot of data and see if it takes a certain shape. There needs to be some human input for this!

Toggling analyses

When we ask statscloud to choose a test for us, instead of specifying one ourselves, it will always pick the best test for us based on the variables we have selected. Most often though, you'll know exactly what test it is you need to run, and you'll select it from the list of tests available. When you specify a test yourself, it's possible that test won't be reliable (some of the test assumptions may be violated) but, don't worry, statscloud will warn us about this and will suggest an alternative if that's the case. Let's see that in action.

We'll run another One-way independent ANOVA now that considers how the Worldwide gross differs across the four phases. This time, we click on 'One-way ANOVA (Independent)' in the list and specify our variables: Phase as the grouping variable and, this time, Worldwide gross as the measured variable.

When we run this analysis, statscloud immediately tells us the analysis may not be reliable and that a better test is available. If we scroll down to the 'Reliability' header, we can see why; Normality has a red flag.

Note: The reason the data isn't normally distributed is because the penultimate film in Phase 3, Avengers: Endgame (row 22), has a huge worldwide gross of almost $3 billion! Compared to other films (which are still very high), that's a big jump, and would be considered an outlier. As a result, it's caused the data distribution of this group to have a positive skew; this value really stretches out the right tail of the distribution.

You can see this effect yourself when you change the chart intervals to 'range'. Notice how the interval stretches up much longer in Phase 3.

You can see statscloud has suggested the Kruskal-Wallis test for us as an alternative (the non-parametric equivalent of the one-way ANOVA). We can change to this test by simply clicking 'Change test' in the banner, or by selecting it from the drop-down list of tests in the top banner.

Setting the alpha value

Usually, when working with frequentist statistics in social science, we use an alpha level of .05. However, sometimes, this may not be good enough and, in order to consider a test to be significant, we should set the alpha level to be lower (so that it is more difficult to obtain a 'significant' result). We can do this individually for every test by choosing a new value for the alpha level using the alpha drop-down list.

Task: Edit the alpha level for the Worldwide Gross ($) ~ Phase analysis.

Change the alpha level for this test from 0.05 to 0.01. Look at the 'significant' result under the 'Highlights' heading. Notice that this has now changed to non-significant. If you view the p-value in the test results table, you can see why; it's below .05 but not below .01!

Running other analyses

We've seen how much we can do with a simple analysis but there is still lots more to explore here! Have a go at running some other analyses (below) and see what else you can do.

Task: Try running some other analyses

Run a correlation

Open up the Happiest counties data set

Run a Spearman's Rho correlation between the Happiness score, GDP per capita, Healthy Life Expectancy, and Generosity variables Take a look at the overall analysis output and then expand the correlation matrix in the sidebar to look at each correlation one by one. Notice how some pairs of variables have very clear relationships (as shown in the scatter chart) and others don't

Run a chi-squared test

Open up the Brexit referendum results data set

Run a Chi-squared test of association with Region as the grouping variable and Result as the measured variable

View the chart in full-screen mode to get a better look at the data

Recap

This should give you a good idea of how easy it is to run analyses in statscloud and how the interactive outputs and summaries can help you understand what you've done. Here's a summary of everything we've covered here:

How to get statscloud to pick the right test for us automatically
How to select a specific test manually
How to explore the data visually using the chart
How to view live formulas from an analysis in the test results table
How to change the alpha level for a test