Monday, 21 March 2016

Moving over to R

The problem
Data comes in all shapes and sizes. Sometimes they come with some many columns that makes it difficult for one to make sense of . A special kind tool has to be used to work on the data, that tool is R.

R
R is an open source software used for analytics and statistic computing. It has become quite popular because its capabilities. The software is able to do almost everything a spreadsheet does and more.

Structure
R is structured differently from spreadsheets. In spreadsheet, cells stack up to make rows and columns and data is arranged in a tabular form. Not so with R, the closest thing to tabular data representation is called a Dataframe. The closest thing to a column in R is called a vector and for row is list. For example a named column in a spreadsheet is represented as :





In R it is :Tax <- c(34,45,60,70,90,30,45), the 'c' stands for concatenate, which is also a function in spreadsheet for combining data. A named row in a spreadsheet can be:




In R becomes: Tom <- c(Tom,25,London). So a vector contains the same data type while a list do not have the same datatype. 

Interface
Most spreadsheet software have a graphical user interface, that means that users interact with the program with a mouse,touchpad etc. R is a programming language, so interaction is command-line, users have to type in lines of code. R has a GUI module that has to be activated through commandline before use. It's called Rattle. Most R users use an IDE or editor of some sort to write code that R compiler will compile. A search on Google will list out most of them, but RStudio comes highly recommended. Spreadsheets operate on data in cells that stack up to make rows and columns, R being a programming language operates by using variables assigned to vectors and lists that stack up to make Dataframes. 

Assignment operators
People conversant with R must have come across this symbol '<-', it's called a left side assignment operator, it assigns data to names the R software can recognize and work with, just like the example shown earlier, repeated below as :
Tom <- c(Tom,25,London)

There is a right side assignment operator, which looks like this '->', which is just the reverse of the one above, shown below as :

c(Tom,25,London) -> Tom

Finally, the assignment operator used in most programming languages, the '=' sign. This operator is my favourite because it takes just one keystroke to reach unlike the rest two that takes 3 keystrokes (the two symbols + a shift key), shown below as :

Tom = c(Tom,25,London)

Let's stop here for now. There will be another blog on R that will discuss how to do most thing that spreadsheets do in R.

Why don't you give R a go ?
 


No comments:

Post a Comment