Filtering Columns

This tutorial demonstrates how to filter columns in our datasets

Filtering a Column

While filtering rows can be a somewhat complicated operation, filtering columns is straightforward. In a DataFrame, we can remove any column we want to get rid of by calling RemoveColumnAt() and telling it which one we want to remove. We can also copy a data frame first before doing so, to preserve the original data, but for now, let us try to transform our data by removing the unnecessary columns. Here is a brief description of what the function we are using does.

Filtering Helper Functions
Class / ActionDescriptionUsage
DataFrame:RemoveColumnAt(integer columnNumber)This function will remove unwanted columns from our dataset starting from 0 to N, where N is the number of total columns contained in the dataset. It takes in integers, signifying the corresponding column number to remove.//This action removes various empty entries from our CSV file frame:RemoveColumnAt(3)

To accomplish this, we will be going back to our AskAManager.csv file and begin to load it back into our DataFrame, 'frame' using the Load(text fileLocation) action. After properly loading our file, we will be able to use the RemoveColumnAt(integer num) action which will remove unwanted columns with empty entries. In this case, as when looking at the dataset, there are missing chunks of information on these columns: additional context on job title, other monetary comp, currency - other, and additional context on income. We would not want to have undefined values and mostly, we have decided that these columns are not useful pieces of information so it would be okay to get rid of them.

We can click this link to download the file for the CSV.
//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
DataFrame frame

//This loads data relative to the project, so put the AskAManager.csv file in the Data/Miscellaneous folder
frame:Load("../Data/Miscellaneous/AskAManager.csv")

We want to figure out the corresponding column number to these columns. We will be calling RemoveColumnAt() action using our 'frame' object. Reminder that with a CSV file, columns start off at 0, therefore the corresponding column numbers for these headings we are filtering are as follows:

  • Additional context on job title: 3
  • Other monetary comp: 5
  • Currency - other: 7
  • Additional context on income: 8

//This action removes various empty entries from our CSV  file
frame:RemoveColumnAt(3) // additional context on job title
frame:RemoveColumnAt(5) // other monetary comp
frame:RemoveColumnAt(7) // currency - other
frame:RemoveColumnAt(8) // additional context on income

Finally, we can output our dataset to confirm the filter using frame:ToText(). The full code is shown below.

//We need the DataFrame class to load in files for Data Science operations.
use Libraries.Compute.Statistics.DataFrame
use Libraries.Compute.Statistics.Transforms.RemoveUndefinedRowsTransform

//Create a DataFrame, which is essentially a table that understands 
//more information about the data that is being loaded.
DataFrame frame

//This loads data relative to the project, so put the AskAManager.csv file in the Data/Miscellaneous folder
frame:Load("../Data/Miscellaneous/AskAManager.csv")

//This action removes various empty entries from our CSV  file
frame:RemoveColumnAt(3) // additional context on job title
frame:RemoveColumnAt(5) // other monetary comp
frame:RemoveColumnAt(7) // currency - other
frame:RemoveColumnAt(8) // additional context on income

//We can save the frame or output it to the console, like we are doing here.
output frame:ToText()

Try it Yourself!

Press the blue run button to execute the code in the code editor. Press the red stop button to end the program. Your program will work when the console outputs "Build Successful!"

To view the program we have made, we can download the program file

Next Tutorial

In the next tutorial, we will discuss splitting columns, which describes splitting columns.