Your Perfect Assignment is Just a Click Away

We Write Custom Academic Papers

100% Original, Plagiarism Free, Customized to your instructions!


College Data Set Exploration Exercise

College Data Set Exploration Exercise


This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables are

  • Private : Public/private indicator
  • Apps : Number of applications received
  • Accept : Number of applicants accepted
  • • Enroll : Number of new students enrolled

  • Top10perc : New students from top 10 % of high school class
  • • Top25perc : New students from top 25 % of high school class

    • F.Undergrad : Number of full-time undergraduates

    • P.Undergrad : Number of part-time undergraduates

  • Outstate : Out-of-state tuition
  • • Room.Board : Room and board costs

  • Books : Estimated book costs
  • Personal : Estimated personal spending
  • PhD : Percent of faculty with Ph.D.’s
  • Terminal : Percent of faculty with terminal degree
  • • S.F.Ratio : Student/faculty ratio

    • perc.alumni : Percent of alumni who donate

    • Expend : Instructional expenditure per student

    • Grad.Rate : Graduation rate

    Before reading the data into R, it can be viewed in Excel or a text editor.

    (a) Use the read.csv() function to read the data into R. Call the loaded data college. Make sure that you have the directory set to the correct location for the data.

    (b) Look at the data using the fix() function. You should notice that the first column is just the name of each university. We don’t really want R to treat this as data. However, it may be handy to have these names for later. Try the following commands: 2.4 Exercises 55 > rownames (college )=college [,1] > fix(college) You should see that there is now a row.names column with the name of each university recorded. This means that R has given each row a name corresponding to the appropriate university. R will not try to perform calculations on the row names. However, we still need to eliminate the first column in the data where the names are stored. Try > college =college [,-1] > fix(college) Now you should see that the first data column is Private. Note that another column labeled row.names now appears before the Private column. However, this is not a data column but rather the name that R is giving to each row.

    (c) i. Use the summary() function to produce a numerical summary of the variables in the data set.

    ii. Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data. Recall that you can reference the first ten columns of a matrix A using A[,1:10].

    iii. Use the plot() function to produce side-by-side boxplots of Outstate versus Private.

    iv. Create a new qualitative variable, called Elite, by binning the Top10perc variable. We are going to divide universities into two groups based on whether or not the proportion of students coming from the top 10 % of their high school classes exceeds 50 %. > Elite=rep(“No”,nrow(college )) > Elite[college$Top10perc >50]=” Yes” > Elite=as.factor(Elite) > college=data.frame(college , Elite) Use the summary() function to see how many elite universities there are. Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite.

    v. Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables. You may find the command par(mfrow=c(2,2)) useful: it will divide the print window into four regions so that four plots can be made simultaneously. Modifying the arguments to this function will divide the screen in other ways.

    vi. Continue exploring the data, and provide a brief summary of what you discover.…

    Order Solution Now

    Our Service Charter

    1. Professional & Expert Writers: Writers Hero only hires the best. Our writers are specially selected and recruited, after which they undergo further training to perfect their skills for specialization purposes. Moreover, our writers are holders of masters and Ph.D. degrees. They have impressive academic records, besides being native English speakers.

    2. Top Quality Papers: Our customers are always guaranteed papers that exceed their expectations. All our writers have +5 years of experience. This implies that all papers are written by individuals who are experts in their fields. In addition, the quality team reviews all the papers before sending them to the customers.

    3. Plagiarism-Free Papers: All papers provided by Writers Hero are written from scratch. Appropriate referencing and citation of key information are followed. Plagiarism checkers are used by the Quality assurance team and our editors just to double-check that there are no instances of plagiarism.

    4. Timely Delivery: Time wasted is equivalent to a failed dedication and commitment. Writers Hero is known for timely delivery of any pending customer orders. Customers are well informed of the progress of their papers to ensure they keep track of what the writer is providing before the final draft is sent for grading.

    5. Affordable Prices: Our prices are fairly structured to fit all groups. Any customer willing to place their assignments with us can do so at very affordable prices. In addition, our customers enjoy regular discounts and bonuses.

    6. 24/7 Customer Support: At Writers Hero, we have put in place a team of experts who answer all customer inquiries promptly. The best part is the ever-availability of the team. Customers can make inquiries anytime.