Sign In
Not register? Register Now!
Pages:
2 pages/β‰ˆ550 words
Sources:
No Sources
Style:
APA
Subject:
IT & Computer Science
Type:
Coursework
Language:
English (U.S.)
Document:
MS Word
Date:
Total cost:
$ 14.58
Topic:

CRISP-DM Data Preparation for the GE Employee Attrition

Coursework Instructions:

Use Case: GE Employee Attrition
Plan Definition: This should include the CRISP-DM Data Preparation phase:
Create a data analytic architecture pattern; include the details for full implementation. Details will need to address data quality, integrity, and protection specific to the organization, industry, and problem you are addressing.
You will create visualizations representing your solutions for various stakeholders that you need to identify. Develop a project plan detailing the involved stakeholders, the timeline, and strategies for professional and effective collaboration to be used to ensure success.

Coursework Sample Content Preview:

Employee Attrition
Author
Affiliation
Course
Instructor
Due Date
Employee Attrition
Data Understanding
Context
GE is keen on retaining its key employees since it has been noted that it has a high churn rate. The cost of losing an employee is estimated to cost GE 80 percent of the employee's annual income. GE invests heavily in its employees in order to stay competitive in the market. Thus, long hours and significant financial resources are spent training and upskilling employees. GE seeks to build a high-accuracy model that can predict employees most likely to churn. The model should be able to evaluate an employee's profile and give a real-time prediction of the likelihood of the employee churning. As a result, a thorough and efficient model for detecting employees likely to churn is required. Predicting employees that are likely to churn can help management intervene before losing the employee. As part of the project lifecycle, we will undertake the understanding and preparation of data according to the CRISP-DM methodology.
Data types
GE’s human resource data had a total of 1270 row and 35 columns. Of the 35 columns, 9 were categorical data while 26 were numeric data. The categorical data included; 'Attrition', 'BusinessTravel', 'Department' ,'EducationField', 'Gender' ,'JobRole' 'MaritalStatus', 'Over18', and 'OverTime'. Whereas the numeric data included; 'Age', 'DailyRate', 'DistanceFromHome' ,'Education', 'EmployeeCount', 'EmployeeNumber', 'EnvironmentSatisfaction', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobSatisfaction', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StandardHours', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', and 'YearsWithCurrManager'. Attrition is the dependent variable with on two categorical values that is “yes” and “no”. Attrition describes employees that did churn and those that did not. There were no missing values from the dataset. The remaining data set were the independent variables.
Descriptive Statistics
We undertook the descriptive statistics of the numerical values, and attached below is the snippet of the descriptive statistics output. The descriptive statistics reveal no values that were out of range. As a result, we did not find any reason to conclude their outliers. However, some variables seem not to add value to our dataset (Smart Vision Europe, n.d). This values include:” 'EmployeeCount', 'EmployeeNumber','StandardHours'. The descriptive statics revealed no meaningful statistics about the variables. 'EmployeeCount' had a mean, maximum, minimum, and standard deviation of 1, suggesting that its presence in the dataset is insignificant. 'EmployeeNumber' serves as a unique identifier of employees in the dataset, so apart from that, it has no significant value in the dataset. 'StandardHours' as well had a constant value of 80 in all the metrics in the descriptive analysis. Therefore, these variables will be omitted from the study.
Chart 1: Descriptive statistics
Correlation analysis
We did correlation analysis on the numerica...
Updated on
Get the Whole Paper!
Not exactly what you need?
Do you need a custom essay? Order right now:

πŸ‘€ Other Visitors are Viewing These APA Coursework Samples: