Sunday, August 24, 2014

Getting started in R

Many of my posts related to programming will include discussion on the R statistical programming language. I thought I would put out a short-and-sweet guide to getting started in R from a mathematical finance perspective. This guide will get you off the bench and into the game! ...meaning I'll help you get things set up properly for you to begin some analysis.

Installation

First thing you need to do is to install R from here. Those download servers will always have the latest version available for your operating system. *Speaking of which, I primarily run Windows unless I'm doing HPC on the supercomputer, so this guide will be heavily Windows-biased. 

That will install the R language, core packages and libraries, and a simple GUI interface to use. I've found that RStudio is a much better GUI, so the next step will be to immediately install the RStudio IDE from here.

Packages

The beauty of R is that there are thousands of packages that can be easily installed. These packages are created by other R users to add functionality that the base install of R doesn't have or to make it simpler to perform certain tasks. There are a number of core packages that are installed with the R language. Huge numbers of packages have been created to that will be useful for this type of analysis.

To install a new package, open up RStudio and look towards the bottom right of the window. You'll see a window with tabs along the top (as shown in the screenshot below). As the red arrow indicates, click on the tab titled Packages. Then, as the green arrow shows, click the button labeled Install.


A window will pop up that lets you install packages from the online repositories or from a downloaded zip file (see below).


Leave everything at the default setting unless you have some reason to change it. List packages separated by commas as it says. The box even autocompletes sometimes, which is a nice feature for when you don't know the exact name of the package.

Here I'll break down the packages that I'd recommend if you were going to start doing some mathematical and/or statistical analysis on financial data.

Finance

quantmod is a great package that can download data straight from Yahoo Finance, Google Finance, and a few other sources. [See my post on freely available data sources.] Quantmod also contains a number of cool functions for analysis of times series.

RQuantLib is the R counterpart to the QuantLib project. The QuantLib project is trying to bring a steady library of useful quant functions to popular programming languages like C++.

tseries is a package with lots of functions for creating and dealing with time series.

TTR is a package that makes it possible to create trading rules as functions.

forecast adds the ability to forecast time series using a variety of methods.

To install these packages, copy and paste this line in the Packages text box in the window shown in the screenshot above: quantmod, RQuantLib, tseries, TTR, forecast

There are many other packages available under the finance heading at http://www.rdocumentation.org/domains/Finance.

Bayesian Methods

LearnBayes is a package I've found to contain some helpful Bayesian functions.

MCMCpack will be very useful to empirical Bayesians.

bnlearn implements Bayesian networks.

To install these packages, copy and paste this line in the Packages text box in the window shown in the screenshot above: LearnBayes, MCMCpack, bnlearn

LaplacesDemon is a package full of Bayesian methods. It has been really useful in some of my recent work. This package is not available on CRAN (the online repository), so you will have to download the file from the website and install it separately. To do that, just change the dropdown from Repository to Package Archive File and navigate to where you downloaded the package.

As always, there are many more packages available on http://www.rdocumentation.org/domains/Bayesian, especially for users familiar with BUGS.

High-Performance Computing

foreach is a package that adds the common foreach method from many other programming languages into R, except it can handle parallel processing.

Rcpp integrates the R language with C++, so that C++ programs can call R functions and R programs can call C++ functions.

doParallel creates the clusters used in packages like foreach.

plyr brings many of the standard R functions into HPC territory by adding the ability to parallel process.

To install these packages, copy and paste this line in the Packages text box in the window shown in the screenshot above: foreach, Rcpp, doParallel, plyr

There are many other packages available under the finance heading at http://www.rdocumentation.org/domains/HighPerformanceComputing.

General


xtable creates tables that are LaTeX-ready. I've yet to use this one, but I'm pretty excited it exists.

shiny is the package that allows you to create awesome R web apps. See this for examples. This is another package I haven't yet had the opportunity to try out, but I can't wait to get to work with it.

To install these packages, copy and paste this line in the Packages text box in the window shown in the screenshot above: pso, xtable, shiny

Tutorials

Now you got it all installed and ready, the only *minor* thing left is how to use it. There are a number of books available, as well as many online tutorials. The built in help is really quite extensive, and that is usually what I consulted first. As with many other programming languages, looking at code is the best way to learn it, so that's what I did. Go online and find examples of what you are trying to do - there are bound to be some out there. Searching "r [package name] tutorial" should bring up many results for any of the packages I've given here because they are all widely used. Another great place to start is this StackExchange question on the very same topic. 

I'll make a post from time to time detailing how to perform some type of analysis in R. Also, I'll try to include somewhat detailed instructions if I'm discussing some analysis I did in R, even if it isn't a "How To" type of post.

That's it for this guide! You should be all set to begin learning R and performing some mega-awesome data analysis.


No comments:

Post a Comment