Introduction

This series of take home lessons is designed for collaborative learning of R and R Studio (user friendly R interface) among the Finance honours class of 2020.

This is a continuously developing programme that is aimed at becoming a comprehensive archive of materials and a syllabus with which new users can begin learning R and the basics of data handling.

Lesson Content

The course structure will be weekly or bi-weekly circulations of material consisting of instructional R scripts and a lesson outline.

The R scripts allow students to open up the work that we cover that week and execute the pre-written code for that week’s lecture to see the results first hand. The scripts are also heavily notated with descriptions of each exercise we do to narrate the process. Lastly, each script contains a Swirl lesson. Swirl is a package designed to teach R in R and provides sets of questions that need to be answered correctly before you can progress to harder challenges. The package immediatley marks your work and often goves hints when you get stuck. I also will be providing homework questions for you to apply each weeks lessons to practical problems. The answers for the homework will be made available with the following week’s lesson.

The lesson outline is a single html document that introduces the lesson, the learning outcomes and provides a guide to how much time each aspect of the R script should take to go through. Additionally, the outline will contain links to further reading, videos or cheatsheets that I think would supplement the lesson well. Finally, the outline will have links to any Loom videos I have created to clarify challenging parts of the lesson. These videos will range from between 5-15min and will address various aspects of the lesson’s material. There will also be a list of R functions, operators, and packages that are covered in the lesson with a brief description that can act as a quick reference guide for when you need to revisit a concept.

A Collaborative Learning Environment

Each student will have access to a private instance of Stack Overflow that only Finance students and staff will have access to. The platform is designed to allow people to help each other virtually and provides a method for learning while doing and learning while teaching. The platform is gamified and participants can upvote questions and answers that have been posted on the site to increase their reputation score. This is the main site that is open to the public.

So why dont we just use the main site?

First, I hope you all do. And, eventually, I think you all will. But in order for a community as large and succesful as StackOverflow to exist, it needs strict rules and etiquette to be followed. New users often find people short and abrasive and often unhelpful but this is almost always because the new user doesn’t know how to ask questions or doesn’t know the required jargon in order to ask questions. Experienced users often misinterpret this as chancers trying to get other people to do their work for them.

During this year’s R lessons I hope to be able to teach you how to ask questions and know enough about the basics of R to be able to confidently interact with free resources like Stack Overflow and other internet forums.

Once you can frame your problems concisely its very easy to find help and continue to learn on your own.

Course Outline

  1. Why R. This will be a preliminary circulation advertising the advantages of R over other packages (as well as some short-comings).

Working with the R scripts

Although the script lessons come with code pre-written for you to execute, I would suggest retyping each command yourself to become familiar with the syntax and keyboard shortcuts.

Also, I would strongly suggest copying parts of the script onto a new script and play around with it. See how the output changes when you change some part of the code. Play around and get a feel for the language.

If you think there are any holes in the above learning structure or if there are interesting extensions that you feel should be added into an advanced course please let me know (these will be more apparent when you begin using R for your research). Some interesting advanced topics that could be of particular interest in niche areas are:

. Mapping plot tools and packages.

. Time series packages

. Machine learning tools and applications appropriate for large and longer datasets

. RMarkdown for integrating code and plots directly into your documents (I will write the weekly circulations in RMarkdown).

. Optimization packages

SWIRL

Notes and FAQs

  1. SWIRL requires you to make the mistakes it expects you to make. So if the programme asks you to type -5:20 when the scenario clearly requires -(5:20) then you have to first make the mistake by typing -5:20, have it explain why this is wrong and then move on to learning why -(5:20) is correct.

  2. Swirl is mostly output based, not input based. This is useful because there is often more than one way to do the same thing, for instance <- and = are identical as leftward assignment and “print(data)” and “data” do the same thing by printing a sample of data to the console.

  3. The output is required to be perfect. If Swirl asks for a variable to be added called “Height” and you add a variable called “height”, it will treat it as a mistake because the outputs are not identical due to one lowercase “h”.

  4. When Swirl creates a script for you to complete an exercise on, it is reading the result from file and not from the R environment. You must make sure you save before you submit. Also do not change the location of the file.

Evaluation tools

  1. Most common method is code submitted to the console directly.
  2. Swirl can create data and scripts. Manipulations of the script can be evaluated after typing submit() into the console.
  3. MCQ questions can be asked and answered through the console.

Downloading R and R Studio

First download R using the link below:

http://cran.mirror.ac.za/

Once fully installed, download the free desktop version of R Studio using the following link:

https://www.rstudio.com/products/rstudio/download/

You do not need to ever open up R. By opening R studio, R will naturally open in the background.

Here is a walkthrough video for installing R and R Studio on a windows machine.

Here is a walkthrough video for installing R and R Studio on an Apple machine.

Course Evaluation

This course will be evaluated in three ways.

First, you will be marked on an assignment that will need to be completed in R. The details of whether this will be group work or individual is still to be decided.

Second, you will be marked on your reputation score from interactions on StackOverflow (This will be moderated by me to ensure that students are participating and voting fairly).

Third, you will be marked on reproducing your research report in RMarkdown. This requires that you send me your raw data from your data provider and a single Rmd file (Rmarkdown script) that can recreate all of your output and graphs. The purpose of this is to make your research reproducible and transparent. For instance below each equation from your methodology, you will have a code snippet that shows how you actually execute that formula.

An example of this is shown below:

“We estimate Beta using the CAPM formula:”

\[R_{i} = R_{f} + \beta(R_{m} - R_{f})\]

df = data.frame(share = rnorm(252, mean = 0.001, sd = 0.002),
                market = rnorm(252, mean = 0.001, sd = 0.001),
                risk_free = rnorm(252, mean = 0.00005, sd = 0.00001))

model_1 = lm(share ~ risk_free + market -1, data = df)
summary(model_1)

Call:
lm(formula = share ~ risk_free + market - 1, data = df)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0059576 -0.0012573 -0.0000752  0.0011544  0.0058857 

Coefficients:
          Estimate Std. Error t value Pr(>|t|)    
risk_free  16.4864     3.2852   5.018  9.9e-07 ***
market      0.1621     0.1138   1.425    0.156    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.001917 on 250 degrees of freedom
Multiple R-squared:  0.2211,    Adjusted R-squared:  0.2149 
F-statistic: 35.48 on 2 and 250 DF,  p-value: 2.728e-14