Filtering
In spreadsheets, filtering is done by clicking on 'Filters' found in the Data menu. Filters selects data that meets a specified criteria. Usually a drop-down box is attached to a column, unique values of all the data in the column appears in the drop-down box. Any value selected will make the software to display occurrences of the number selected, excluding all other values. In R, filtering is done by:
Slicing: Data once read into R software, can sliced to be get bits and pieces that are needed. As discussed from the previous blog, the nearest representation of tabular data (with rows and columns) is a dataframe. Using RStudio, we read sample data into R in the previous blog. To get a sense of what the data looks like, the 'str' function is used. It stands for structure. Running the command in the console produces:
From the results, we can see that the data has 43 observations of 7 variables, each variable represents a column, there is also the 'factor' term found in the results. In R, category data is called a factor, the categories are called levels. Since columns are variables in R, the sigil sign '$' is used, which also appears in the results above. To display contents of Region column will mean running the command below
Slicing: Data once read into R software, can sliced to be get bits and pieces that are needed. As discussed from the previous blog, the nearest representation of tabular data (with rows and columns) is a dataframe. Using RStudio, we read sample data into R in the previous blog. To get a sense of what the data looks like, the 'str' function is used. It stands for structure. Running the command in the console produces:
From the results, we can see that the data has 43 observations of 7 variables, each variable represents a column, there is also the 'factor' term found in the results. In R, category data is called a factor, the categories are called levels. Since columns are variables in R, the sigil sign '$' is used, which also appears in the results above. To display contents of Region column will mean running the command below
df$Region
Will appear in the console as shown below:
The diagram shows that Region column has 3 categories of data namely Central, East, West.
To filter (called slicing in R). The square brackets are used. The rows,columns and filter criteria are written inside the brackets. The syntax is as shown below:
df2[df2$Region=='Central',c('OrderDate','Rep','Units')]
The dataframe [rows{Filter criteria},column]
To illustrate with our data, filtering data from Central Region will mean running the command below:
df2[df2$Region=='Central',]
Will appear in the console as shown below:
The importance of ',' in the syntax cannot be overemphasized because it what determines if rows or columns will be selected. In the example above, the command is for rows to be selected. If columns are needed in the filter, the command will be
df2[df2$Region=='Central',c('OrderDate','Rep','Units')]
Will run in the console to produce:
The data above shows data from the 'Central' region but with columns specified rather than the entire data. The next blog will discuss Pivot tables in R.
Enjoy filtering in R
No comments:
Post a Comment