Wednesday, 30 March 2016

Moving to R (Part 2)

Opening a spreadsheet file in R
 
Opening a file in R is called reading the file, as R is a command-line programming language. R like any other programming language requires modules to get stuff done and opening (reading) a spreadsheet file will only happen when the right module is attached. A module is simply a portion of a computer program that can be used independently or with other modules to meet an objective.

 R Studio IDE



R Studio IDE shown above,has got to be the gold standard for what an R IDE should be. Shown below, it's got 4 windows namely:
1)Console: This window has the console which is where the compiler is, the compiler executes the R scripts
2)Workspace and history : This window has a directory-like container where resources connected to a task is kept. It also has a tab ' history 'where lines of codes compiled earlier are stored.
3)Files,plots,packages and help : This window has directory that shows all system files. that shows has a tab that shows graphs and another for packages. Packages are special R programs for doing lots of things.
4)Scripts and Data view: This window has tab for scripts and viewing data. Script written can be executed in the console. The data view tab is used for examining data.

Reading a spreadsheet file
There are several modules for reading spreadsheet file into R. This web page discusses it in details. Using R Studio, we will show how to read spreadsheet data into R. Below, are screenshots  of the windows mentioned above reading file into R.

Scripts and Data view



Above the Script view, script to read  spreadsheet data using two methods, Method 1 used the 'gdata' module using the 'require'  keyword and read.xls function is used to load the spreadsheet file into R. Method 2 involved saving the spreadsheet file as a text or csv (Comma Seperated Value) and using read.csv function, the data was read into R. Observe that each data read has been assigned a value using the '=' assignment operator. Also see two tabs representing views of the data read in. Scripts written in the script window are executed in the Console view by pressing a combination of keys on the keyboard. The keys are :
Ctrl+Shift + Enter

Console



 
Pressing the keys above, makes the console to compile any codes from the script. Any operation in IDE must be translated to a code in the console for it to work. An example selecting a data view tab will be translated to 'view(the data to be viewed)'. The other two windows do not have changes.

Aggregation, Charting, Pivot tables, Filtering and so on, and so forth can be done in R, we will explore further in the next blog.

Enjoy trying R out.

 


 

Monday, 21 March 2016

Moving over to R

The problem
Data comes in all shapes and sizes. Sometimes they come with some many columns that makes it difficult for one to make sense of . A special kind tool has to be used to work on the data, that tool is R.

R
R is an open source software used for analytics and statistic computing. It has become quite popular because its capabilities. The software is able to do almost everything a spreadsheet does and more.

Structure
R is structured differently from spreadsheets. In spreadsheet, cells stack up to make rows and columns and data is arranged in a tabular form. Not so with R, the closest thing to tabular data representation is called a Dataframe. The closest thing to a column in R is called a vector and for row is list. For example a named column in a spreadsheet is represented as :





In R it is :Tax <- c(34,45,60,70,90,30,45), the 'c' stands for concatenate, which is also a function in spreadsheet for combining data. A named row in a spreadsheet can be:




In R becomes: Tom <- c(Tom,25,London). So a vector contains the same data type while a list do not have the same datatype. 

Interface
Most spreadsheet software have a graphical user interface, that means that users interact with the program with a mouse,touchpad etc. R is a programming language, so interaction is command-line, users have to type in lines of code. R has a GUI module that has to be activated through commandline before use. It's called Rattle. Most R users use an IDE or editor of some sort to write code that R compiler will compile. A search on Google will list out most of them, but RStudio comes highly recommended. Spreadsheets operate on data in cells that stack up to make rows and columns, R being a programming language operates by using variables assigned to vectors and lists that stack up to make Dataframes. 

Assignment operators
People conversant with R must have come across this symbol '<-', it's called a left side assignment operator, it assigns data to names the R software can recognize and work with, just like the example shown earlier, repeated below as :
Tom <- c(Tom,25,London)

There is a right side assignment operator, which looks like this '->', which is just the reverse of the one above, shown below as :

c(Tom,25,London) -> Tom

Finally, the assignment operator used in most programming languages, the '=' sign. This operator is my favourite because it takes just one keystroke to reach unlike the rest two that takes 3 keystrokes (the two symbols + a shift key), shown below as :

Tom = c(Tom,25,London)

Let's stop here for now. There will be another blog on R that will discuss how to do most thing that spreadsheets do in R.

Why don't you give R a go ?