Sign In
Not register? Register Now!
Pages:
2 pages/≈550 words
Sources:
Check Instructions
Style:
APA
Subject:
IT & Computer Science
Type:
Essay
Language:
English (U.S.)
Document:
MS Word
Date:
Total cost:
$ 12.15
Topic:

Data Understanding and Data Preparation

Essay Instructions:

Overview: In Milestone One, you examined CRISP-DM and each of its phases. You performed Phase One, Business Understanding, by:
1) Identifying the business problem
2) Stating the research question
3) Discussing how the solution would help the business
4) Describing the analytic plan, which included the remaining phases with the steps in each phase
In this milestone, you will perform Phase Two, Data Understanding and Phase Three, Data Preparation.
Prompt: In Milestone Two, you will begin performance on the analytic plan. You will write the Data Understanding and perform the Data Preparation.
If you have any questions after reading through the feedback on this milestone, reach out to your instructor. Remember that your instructor is a resource you should utilize throughout the course.
While you must reflect on your prior coursework, your submission must consist only of DAT 690 coursework to avoid self-plagiarism. Make sure to include the
following critical elements in your paper

Essay Sample Content Preview:

Milestone Two: Data Understanding and Data Preparation
Student’s Name
University
Course
Professor’s Name
Date
Milestone Two: Data Understanding and Data Preparation
The data preparation phase is essential in data mining. It helps to prepare the data that will be used for modelling purposes in manner that will ensure that meaningful information is obtained after analysis. Data preparation is important due to various purposes. First, real-world data usually contains a lot of unnecessary information, which causes the data to contain noise. For example, in an interview by Ferguson (2014) Philip Kim noted that when collecting data about a business problem at General Electric (GE), they usually obtain any available information. Such a process causes data to have problems with incompleteness as well as inconsistency, which create problems during data modelling. Data preparation helps eliminate such issues using various steps that are dependent on the set of data that is being mined. In this regard, some techniques are more suitable than others depending on the situation. Thus, data preparation is an important phase of the Cross-Industry Standard Process for Data Mining (CRISP-DM) process.
The first step in data preparation involves describing the data. It involves recognizing the various attributes of the data and their relevance to the information that will be obtained after data mining. For example, when a company collects data to learn customer behavior, demographics data contains information about personal attributes that are important to business marketing and advertising (Exenberger, & Bucko, 2020). In addition, data description involves identifying the amount of data available about a certain attribute, including the respective columns and the column names. Since data mining involves working with large amounts of data, descriptions provide a quick way for identifying the available raw data for a specific attribute during mining. The step also involves summarizing the data using descriptive statistics, and conducting correlation analysis such as t-tests. The practice helps the analysts to identify orthogonal datasets, that are can be used for modelling. Therefore, data set descriptions help in recognizing and classifying the available data into meaningful information for modelling.
The second step in the data preparation process is the selection. Since real-world data u...
Updated on
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:

👀 Other Visitors are Viewing These APA Essay Samples:

HIRE A WRITER FROM $11.95 / PAGE
ORDER WITH 15% DISCOUNT!