Essay Available:

Pages:

4 pages/≈1100 words

Sources:

No Sources

Style:

APA

Subject:

Mathematics & Economics

Type:

Statistics Project

Language:

English (U.S.)

Document:

MS Word

Date:

2020-08-29

Total cost:

$ 20.74

Topic:

Project 2 Mathematics & Economics Statistics Project

Statistics Project Instructions:

Project Two: Hypothesis Testing

This notebook contains the step-by-step directions for Project Two. It is very important to run through the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the steps in this notebook, be sure to write your summary report.

You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the team and your management have requested that you perform several hypothesis tests to statistically validate claims about your team's performance. This analysis will provide evidence for these claims and help make key decisions to improve the performance of the team. You will use the Python programming language to perform the statistical analyses and then prepare a report of your findings for the team’s management. Since the managers are not data analysts, you will need to interpret your findings and describe their practical implications.

There are four important variables in the data set that you will study in Project Two.

Variable	What does it represent?
pts	Points scored by the team in a game
elo_n	A measure of relative skill level of the team in the league
year_id	Year when the team played the games
fran_id	Name of the NBA team

The ELO rating, represented by the variable elo_n, is used as a measure of the relative skill of a team. This measure is inferred based on the final score of a game, the game location, and the outcome of the game relative to the probability of that outcome. The higher the number, the higher the relative skill of a team.

In addition to studying data on your own team, your management has also assigned you a second team so that you can compare its performance with your own team's.

Team	What does it represent
Your Team	This is the team that has hired you as an analyst. This is the team that you will pick below. See Step 2.
Assigned Team	This is the team that the management has assigned to you to compare against your team. See Step 1.

Reminder: It may be beneficial to review the summary report template for Project Two prior to starting this Python script. That will give you an idea of the questions you will need to answer with the outputs of this script.

Step 1: Data Preparation & the Assigned Team

This step uploads the data set from a CSV file. It also selects the Assigned Team for this analysis. Do not make any changes to the code block below.

The Assigned Team is Chicago Bulls from the years 1996 - 1998

Run

In [1]:

import numpy as np
import pandas as pd
import scipy.stats as st
import matplotlib.pyplot as plt
from IPython.display import display, HTML
nba_orig_df = pd.read_csv('nbaallelo.csv')
nba_orig_df = nba_orig_df[(nba_orig_df['lg_id']=='NBA') & (nba_orig_df['is_playoffs']==0)]
columns_to_keep = ['game_id','year_id','fran_id','pts','opp_pts','elo_n','opp_elo_n', 'game_location', 'game_result']
nba_orig_df = nba_orig_df[columns_to_keep]
# The dataframe for the assigned team is called assigned_team_df. 
# The assigned team is the Bulls from 1996-1998.
assigned_years_league_df = nba_orig_df[(nba_orig_df['year_id'].between(1996, 1998))]
assigned_team_df = assigned_years_league_df[(assigned_years_league_df['fran_id']=='Bulls')]
assigned_team_df = assigned_team_df.reset_index(drop=True)
display(HTML(assigned_team_df.head().to_html()))
print("printed only the first five observations...")
print("Number of rows in the dataset =", len(assigned_team_df))

	game_id	year_id	fran_id	pts	opp_pts	elo_n	opp_elo_n	game_location	game_result
0	199511030CHI	1996	Bulls	105	91	1598.2924	1531.7449	H	W
1	199511040CHI	1996	Bulls	107	85	1604.3940	1458.6415	H	W
2	199511070CHI	1996	Bulls	117	108	1605.7983	1310.9349	H	W
3	199511090CLE	1996	Bulls	106	88	1618.8701	1452.8268	A	W
4	199511110CHI	1996	Bulls	110	106	1621.1591	1490.2861	H	W

printed only the first five observations...
Number of rows in the dataset = 246

Step 2: Pick Your Team

In this step, you will pick your team. The range of years that you will study for your team is 2013-2015. Make the following edits to the code block below:

Replace ??TEAM?? with your choice of team from one of the following team names.
*Bucks, Bulls, Cavaliers, Celtics, Clippers, Grizzlies, Hawks, Heat, Jazz, Kings, Knicks, Lakers, Magic, Mavericks, Nets, Nuggets, Pacers, Pelicans, Pistons, Raptors, Rockets, Sixers, Spurs, Suns, Thunder, Timberwolves, Trailblazers, Warriors, Wizards*
Remember to enter the team name within single quotes. For example, if you picked the Suns, then ??TEAM?? should be replaced with 'Suns'.

After you are done with your edits, click the block of code below and hit the Run button above.

In [3]:

 # Range of years: 2013-2015 (Note: The line below selects all teams within the three-year period 2013-2015. This is not your team's dataframe.
your_years_leagues_df = nba_orig_df[(nba_orig_df['year_id'].between(2013, 2015))]
# The dataframe for your team is called your_team_df.
# ---- TODO: make your edits here ----
your_team_df = your_years_leagues_df[(your_years_leagues_df['fran_id']=='Bulls')]
your_team_df = your_team_df.reset_index(drop=True)
display(HTML(your_team_df.head().to_html()))
print("printed only the first five observations...")
print("Number of rows in the dataset =", len(your_team_df))

	game_id	year_id	fran_id	pts	opp_pts	elo_n	opp_elo_n	game_location	game_result
0	201210310CHI	2013	Bulls	93	87	1598.8490	1415.1243	H	W
1	201211020CLE	2013	Bulls	115	86	1610.6219	1345.7418	A	W
2	201211030CHI	2013	Bulls	82	89	1593.3835	1471.2083	H	L
3	201211060CHI	2013	Bulls	99	93	1596.7441	1499.5527	H	W
4	201211080CHI	2013	Bulls	91	97	1587.4724	1653.8605	H	L

printed only the first five observations...
Number of rows in the dataset = 246

Step 3: Hypothesis Test for the Population Mean (I)

A relative skill level of 1420 represents a critically low skill level in the league. The management of your team has hypothesized that the average relative skill level of your team in the years 2013-2015 is greater than 1420. Test this claim using a 5% level of significance. For this test, assume that the population standard deviation for relative skill level is unknown. Make the following edits to the code block below:

Replace ??DATAFRAME_YOUR_TEAM?? with the name of your team's dataframe. See Step 2 for the name of your team's dataframe.
Replace ??RELATIVE_SKILL?? with the name of the variable for relative skill. See the table included in the Project Two instructions above to pick the variable name. Enclose this variable in single quotes. For example, if the variable name is var2 then replace ??RELATIVE_SKILL?? with 'var2'.
Replace ??NULL_HYPOTHESIS_VALUE?? with the mean value of the relative skill under the null hypothesis.

After you are done with your edits, click the block of code below and hit the Run button above.

In [4]:

 import scipy.stats as st
# Mean relative skill level of your team
mean_elo_your_team = your_team_df['elo_n'].mean()
print("Mean Relative Skill of your team in the years 2013 to 2015 =", round(mean_elo_your_team,2))
# Hypothesis Test
# ---- TODO: make your edits here ----
test_statistic, p_value = st.ttest_1samp(your_team_df['elo_n'], 1420)
print("Hypothesis Test for the Population Mean")
print("Test Statistic =", round(test_statistic,2)) 
print("P-value =", round(p_value,4))

Mean Relative Skill of your team in the years 2013 to 2015 = 1548.57
Hypothesis Test for the Population Mean
Test Statistic = 56.22
P-value = 0.0

Step 4: Hypothesis Test for the Population Mean (II)

A team averaging 110 points is likely to do very well during the regular season. The coach of your team has hypothesized that your team scored at an average of less than 110 points in the years 2013-2015. Test this claim at a 1% level of significance. For this test, assume that the population standard deviation for relative skill level is unknown.

You are to write this code block yourself.

Use Step 3 to help you write this code block. Here is some information that will help you write this code block. Reach out to your instructor if you need help.

The dataframe for your team is called your_team_df.

The variable 'pts' represents the points scored by your team.

Calculate and print the mean points scored by your team during the years you picked.

Identify the mean score under the null hypothesis. You only have to identify this value and do not have to print it. (Hint: this is given in the problem statement)

Assuming that the population standard deviation is unknown, use Python methods to carry out the hypothesis test.

Calculate and print the test statistic rounded to two decimal places.

Calculate and print the P-value rounded to four decimal places.

Write your code in the code block section below. After you are done, click this block of code and hit the Run button above. Reach out to your instructor if you need more help with this step.

In [7]:

 from scipy.stats import ttest_1samp
import numpy as np
mean_pts = your_team_df['pts'].mean()
print("Mean Points =",mean_pts)
tstat, pval = ttest_1samp(your_team_df['pts'], 110)
print('T Stat = %.2f, P Value = %.4f' % (tstat, pval))
if pval < 0.01:
    print("Reject the null hypothesis")
else:
    print("Accept the null hypothesis")

Mean Points = 95.8780487804878
T Stat = -19.05, P Value = 0.0000
Reject the null hypothesis

Step 5: Hypothesis Test for the Population Proportion

Suppose the management claims that the proportion of games that your team wins when scoring 80 or more points is 0.50. Test this claim using a 5% level of significance. Make the following edits to the code block below:

Replace ??COUNT_VAR?? with the variable name that represents the number of games won when your team scores over 80 points. (Hint: this variable is in the code block below).

Replace ??NOBS_VAR?? with the variable name that represents the total number of games when your team scores over 80 points. (Hint: this variable is in the code block below).

Replace ??NULL_HYPOTHESIS_VALUE?? with the proportion under the null hypothesis.

After you are done with your edits, click the block of code below and hit the Run button above.

In [8]:

 from statsmodels.stats.proportion import proportions_ztest
your_team_gt_80_df = your_team_df[(your_team_df['pts'] > 80)]
# Number of games won when your team scores over 80 points
counts = (your_team_gt_80_df['game_result'] == 'W').sum()
# Total number of games when your team scores over 80 points
nobs = len(your_team_gt_80_df['game_result'])
p = counts*1.0/nobs
print("Proportion of games won by your team when scoring more than 80 points in the years 2013 to 2015 =", round(p,4))
# Hypothesis Test
# ---- TODO: make your edits here ----
test_statistic, p_value = proportions_ztest(counts,nobs,80)
print("Hypothesis Test for the Population Proportion")
print("Test Statistic =", round(test_statistic,2)) 
print("P-value =", round(p_value,4))

Proportion of games won by your team when scoring more than 80 points in the years 2013 to 2015 = 0.6413
Hypothesis Test for the Population Proportion
Test Statistic = -2470.81
P-value = 0.0

Step 6: Hypothesis Test for the Difference Between Two Population Means

The management of your team wants to compare the team with the assigned team (the Bulls in 1996-1998). They claim that the skill level of your team in 2013-2015 is the same as the skill level of the Bulls in 1996 to 1998. In other words, the mean relative skill level of your team in 2013 to 2015 is the same as the mean relative skill level of the Bulls in 1996-1998. Test this claim using a 1% level of significance. Assume that the population standard deviation is unknown. Make the following edits to the code block below:

Replace ??DATAFRAME_ASSIGNED_TEAM?? with the name of assigned team's dataframe. See Step 1 for the name of assigned team's dataframe.

Replace ??DATAFRAME_YOUR_TEAM?? with the name of your team's dataframe. See Step 2 for the name of your team's dataframe.

Replace ??RELATIVE_SKILL?? with the name of the variable for relative skill. See the table included in Project Two instructions above to pick the variable name. Enclose this variable in single quotes. For example, if the variable name is var2 then replace ??RELATIVE_SKILL?? with 'var2'.

After you are done with your edits, click the block of code below and hit the Run button above.

In [10]:

 import scipy.stats as st
mean_elo_n_project_team = assigned_team_df['elo_n'].mean()
print("Mean Relative Skill of the assigned team in the years 1996 to 1998 =", round(mean_elo_n_project_team,2))
mean_elo_n_your_team = your_team_df['elo_n'].mean()
print("Mean Relative Skill of your team in the years 2013 to 2015  =", round(mean_elo_n_your_team,2))
# Hypothesis Test
# ---- TODO: make your edits here ----
test_statistic, p_value = st.ttest_ind(assigned_team_df['elo_n'],your_team_df['elo_n'])
print("Hypothesis Test for the Difference Between Two Population Means")
print("Test Statistic =", round(test_statistic,2)) 
print("P-value =", round(p_value,4))

Mean Relative Skill of the assigned team in the years 1996 to 1998 = 1739.8
Mean Relative Skill of your team in the years 2013 to 2015  = 1548.57
Hypothesis Test for the Difference Between Two Population Means
Test Statistic = 47.79
P-value = 0.0

End of Project Two

Download the HTML output and submit it with your summary report for Project Two. The HTML output can be downloaded by clicking File, then Download as, then HTML. Do not include the Python code within your summary report.

Words Characters Reading time

Statistics Project Sample Content Preview:

MAT 243 Project Two Summary Report
[Full Name]
[SNHU Email]
Southern New Hampshire University
Note: Replace the bracketed text on page one (the cover page) with your personal information.
Introduction: Problem Statement
Discuss the statement of the problem in terms of the statistical analyses that are being performed. In your response, you should address the following questions:
* What is the problem you are going to solve?
* What data set are you using?
* What statistical methods will you be using to do the analysis for this project?
The statistical analyses performed aim to investigate performance patterns in order to help make key decisions to improve the team’s performance. In particular, the statistical analyses focus on the team’s relative skill level, average points, and the proportion of games won when scoring 80 points or more.
Introduction: Your Team and the Assigned Team
In the Python script, you picked the same team and years that you picked for Project One. The assigned team and its range of years will be the same as in Project One as well.
See Steps 1 and 2 in the Python script to address the following items in the table below:
* What team did you pick and what years were picked to do the analysis?
* What team and range of years were you assigned for the comparative study? (Hint: this is called the assigned team in the Python script.) Present this information in a ...

Updated on January 26, 2024

Get the Whole Paper!

Not exactly what you need?

Do you need a custom essay? Order right now:

Order

Project 2 Mathematics & Economics Statistics Project

Step 3: Hypothesis Test for the Population Mean (I)

Step 5: Hypothesis Test for the Population Proportion

Step 6: Hypothesis Test for the Difference Between Two Population Means

End of Project Two

You Might Also Like Other Topics Related to basketball: