The Lending Club is a lending company based on San Franscisco, CA. They connect borrowers with investors through an online marketplace. They have provided a publicly available data from 2007-2011.
You just got an interview as an analyst for the Lending Club. The client wants you to analyze this big amount of information.
1. Start by making initial observations of the data. What types of variables are present? Is there anything that catches your eye? A good analyst checks the data carefully. See the Quartz Guide to Bad Data
2. Use at least two ways to summarize the qualitative data present in the data set with frequency distributions and the various graphs/charts we have used in the class for Chapter 2.
3. Do the same thing with the quantitative data present. These four ways should be different aspects from the data set. Interpret your results.
4. Pick two of the above graphs you chose and describe the shape of those distributions.
5. Why did you use the certain graphs you did? Are there any benefits over the other?
6. Now I want you to take two variables you think might be related. Create a scatterplot. Find the covariance, correlation and interpret the results.
7. For the 2 examples you chose on Step 4, give me the best central tendency measure you feel is right for the data sets. Then find their sample variances.
8. Create a box plot for me for one of the examples.
9. Depending on the distribution you get for Step 8, let me know where the limits of the observations lie within 2 standard deviations of the mean. What does this mean in relation to the variable?
Finally give me a summary of what you have discovered as a whole from this data set. You want the Lending Club to know that you are very interested in working with them. Give them something to think about.
Submit on Canvas. Send me whatever work you have done with Excel or any other tool you wish to use all in 1 document. Send me formulas/code used. DO NOT Handwrite the calculations.


Student’s Name
Professor’s Name
SCM Project
The data set contains both qualitative (ordinal and nominal) and quantitative data variables (integral, interval and ratio). There are multiple fields such as loan_amnt, max_bal_bc, inq_fi, settlement_term among many others. Additionally, the data also contains numerous fields with missing data. Examples include id, member_id, annual_inc_joint, dti_joint among others. The fields with missing data do not pose any threat to the preciseness of the data since their omission is deliberate for either unavailability or confidentiality purposes. Further, the data in question does not replace any qualitative values (i.e., nothing), with a numeric one (0), hence significantly reducing the risks usually found with automation in data processing.
The data does not consist of any duplicated fields, and the formatting remained constant through the data, for example, the number of payments on the loan (term) remained in months whereas the employment length was measurable in years. The data provided by The Lending Club was specific and well intended to connect borrowers with investors through an online marketplace. Units in the data are also well labeled with units such as years, months among others to avoid confusion during data analysis. Additionally, labeling has been done correctly to avoid ambiguousness, for example, home ownership status has been labeled as either RENT, OWN OR MORTGAGE, verification status has been labeled as either verified, not verified or income source verified and so on. The data generally does not have the qualities of bad data. The data has been downloaded from The Lending Club website and is presented as ‘LoanStats3a’ and is a fair representation of information being presented.
2 Qualitative Data 
There are multiple ways in which data can be summarized and presented for easier understanding. Here, two methods will be used to summarize and represent qualitative data. They include the use of a pie chart and a bar graph.

