Data Preprocessing in Weka

At the very basic level the important thing while doing data science is preparing data on which we will be working on . The whole thing of machine learning,AI stands on data. so its our very first duty to prepare the data. The process of preparing the data is called data Preprocessing.
in our daily life we deals in the data available  or made in spreadsheet or .xls format but the weka or other machine learning software deals with .CSV format or ARFF formats . CSV is the short abbreviation of Comma Separated Value. 

In CSV file format the  tabular fashioned data is stored in the form of plain text in which each line of the file is record consisting more then 2 field separated by Comma.
You can find more on CSV and ARFF.

I have been doing lot of file conversion for my project and lab work and YEAH frankly speaking these file conversions are most easy and interesting thing(you will definitely feel boring initially ),but the fact is that you really need to make good hands in file conversion.so I will be demonstrating the same example as we were supposed to do in our labs.

File Conversion

We assume that all of our data is stored in a Microsoft Excel spreadsheet "weather.xls".

xls data

We expect the data file to be in ARFF format file, so conversion of format go as follows. 
.xls format ➨.csv format ➨ .ARFF format 
In addition to the standard ARFF data file format,Weka has the capability to read in ".csv" format files. we will discuss dealing direclty with .csv later ,first we focus on .ARFF.

To Save your data in comma separated format,select the 'save As....' menu item from excel 'file' pull-down menu.In the ensuing dialogue box select 'CSV(comma delimited)' from the file type pop-up menu and click save button.Now open your file with any of the word/notepad and your document will look like this.

csv data

you can notice the change that the rows of original spreadsheet are converted into lines of text where the elements are separated form each other by commas.
In this lines of text you need to change the first line,which holds the attribute name,into the header structure that makes up the beginning of an ARFF file.Add a @relation tag with the data set name and an @attribute tag with the attribute information, and a @data tag as shown below.

ARFF file format

Now save the file with extension .arff to indicate that it is in ARFF format.
So bye for now and practise conversion for different data sets . you can download the data sets from this site.

We will be discussing about Loading & Understanding the data in next blog. Till then enjoy!!

Thank you!!🙂


Comments

Popular posts from this blog

Login into Gmail Account Using Web Driver

Tutorial 4

Case study of Library Management System