Wednesday 28 October 2015

Assignment #3 - Making Data Management Decisions

The more that i work on this project, the more I want to include. I love the idea of tying these pieces of data together to get a strong and telling picture. Granted, what I include has to be relevant to my hypothesis.

In this third assignment we had to ultimately clean up our data. Take out data that was not relevant, and look at how that good data left behind is distributed and what it might tell us so far. 

PROGRAM

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;
DATA new;set mydata.addhealth_pds;
label h1ee1="How Much Do You Want To Go To College?"
h1ee2="How Likely Is It That You Will Go To College?"
h1nm4="How Far Did Bio Mom Go?"
h1nf4="How Far Did Bio Dad Go?"
h1rm1="How Far Did Mom Go?"
h1rf1="How Far Did Dad Go?"
h1wp11="How Disappointed Would Mom Be If You Did not Graduate College?"
h1wp12="How Disappointed Would Mom Be If You Did Not Graduate High School?"
h1wp15="How Disappointed Would Dad Be If You Did not Graduate College?"
h1wp16="How Disappointed Would Dad Be If You Did Not Graduate High School?"
h1wp17H="In the past week have you talked to Mom about school work or grades?"
h1wp17I="In the past week have you worked with Mom on a project for school?"
h1wp17J="In the past week have you talked with Mom about other things to do with school?"
h1wp18h="In the past week have you talked to Dad about school work or grades?"
h1wp18I="In the past week have you worked with Dad on a project for school?"
h1wp18J="In the past week have you talked with Dad about other things to do with school?"
WANTSTOGOTOCOLLEGE="RESPONDENT INDICATES WANTING TO ATTEND COLLEGE"
THINKSWILLGOTOCOLLEGE="RESPONDENT BELIEVES THEY WILL GO TO COLLEGE";
IF h1ee1=8 THEN h1ee1=.;
IF h1ee1=6 THEN h1ee1=.;
IF h1ee2=8 THEN h1ee2=.;
IF h1ee2=6 THEN h1ee2=.;
IF h1nm4=96 THEN h1nm4=.;
IF h1nm4=97 THEN h1nm4=.;
IF h1nm4=98 THEN h1nm4=.;
IF h1nm4=12 THEN h1nm4=.;
IF h1nm4=11 THEN h1nm4=.;
IF h1nf4=96 THEN h1nf4=.;
IF h1nf4=97 THEN h1nf4=.;
IF h1nf4=98 THEN h1nf4=.;
IF h1nf4=12 THEN h1nf4=.;
IF h1nf4=11 THEN h1nf4=.;
IF h1rm1=96 THEN h1rm1=.;
IF h1rm1=97 THEN h1rm1=.;
IF h1rm1=98 THEN h1rm1=.;
IF h1rm1=12 THEN h1rm1=.;
IF h1rm1=11 THEN h1rm1=.;
IF h1rf1=96 THEN h1rf1=.;
IF h1rf1=97 THEN h1rf1=.;
IF h1rf1=98 THEN h1rf1=.;
IF h1rf1=12 THEN h1rf1=.;
IF h1rf1=11 THEN h1rf1=.;
IF h1wp11=6 then h1wp11=.; /* 6=refused*/
IF h1wp11=7 then h1wp11=.; /* 6=legit skip - no mom*/
IF h1wp11=8 then h1wp11=.; /* 6=don't know  IT IS ODD THAT THIS QUESTION DOES NOT HAVE RESPONSES FOR 9 AS ALL OTHER SIMILAR ONES DO??*/
IF h1wp12=6 then h1wp12=.; /* 6=refused*/
IF h1wp12=7 then h1wp12=.; /* 6=legit skip - no mom*/
IF h1wp12=8 then h1wp12=.; /* 6=don't know*/
IF h1wp12=9 then h1wp12=.; /* 6=not applicable*/
IF h1wp15=6 then h1wp15=.; /* 6=refused*/
IF h1wp15=7 then h1wp15=.; /* 6=legit skip - no mom*/
IF h1wp15=8 then h1wp15=.; /* 6=don't know*/
IF h1wp15=9 then h1wp15=.; /* 6=not applicable*/
IF h1wp16=6 then h1wp16=.; /* 6=refused*/
IF h1wp16=7 then h1wp16=.; /* 6=legit skip - no mom*/
IF h1wp16=8 then h1wp16=.; /* 6=don't know*/
IF h1wp16=9 then h1wp16=.; /* 6=not applicable*/
IF H1WP17H=6 THEN H1WP17H=.; /* 6= REFUSED*/
IF H1WP17H=7 THEN H1WP17H=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP17H=8 THEN H1WP17H=.; /* 8= DON'T KNOW*/
IF H1WP17I=6 THEN H1WP17I=.; /* 6= REFUSED*/
IF H1WP17I=7 THEN H1WP17I=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP17I=8 THEN H1WP17I=.; /* 8= DON'T KNOW*/
IF H1WP17J=6 THEN H1WP17J=.; /* 6= REFUSED*/
IF H1WP17J=7 THEN H1WP17J=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP17J=8 THEN H1WP17J=.; /* 8= DON'T KNOW*/
IF H1WP18H=6 THEN H1WP18H=.; /* 6= REFUSED*/
IF H1WP18H=7 THEN H1WP18H=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP18H=8 THEN H1WP18H=.; /* 8= DON'T KNOW*/
IF H1WP18I=6 THEN H1WP18I=.; /* 6= REFUSED*/
IF H1WP18I=7 THEN H1WP18I=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP18I=8 THEN H1WP18I=.; /* 8= DON'T KNOW*/
IF H1WP18J=6 THEN H1WP18J=.; /* 6= REFUSED*/
IF H1WP18J=7 THEN H1WP18J=.; /* 7= LEGIT SKIP NO MOM*/
IF H1WP18J=8 THEN H1WP18J=.; /* 8= DON'T KNOW*/
WANTSTOGOTOCOLLEGE= .;
IF (H1EE1=1) OR (H1EE1=2) THEN WANTSTOGOTOCOLLEGE=1;
IF (H1EE1=3) THEN WANTSTOGOTOCOLLEGE=2;
IF (H1EE1=4) OR (H1EE1=5) THEN WANTSTOGOTOCOLLEGE=3;
THINKSWILLGOTOCOLLEGE= .;
IF (H1EE2=1) OR (H1EE2=2) THEN THINKSWILLGOTOCOLLEGE=1;
IF (H1EE2=3) THEN THINKSWILLGOTOCOLLEGE=2;
IF (H1EE2=4) OR (H1EE2=5) THEN THINKSWILLGOTOCOLLEGE=3;
proc sort; by AID;
proc freq; tables h1ee1 WANTSTOGOTOCOLLEGE h1ee2 THINKSWILLGOTOCOLLEGE H1nm4 h1nf4 h1rm1 h1rf1 h1wp11 h1wp12 h1wp15 h1wp16 h1wp17h h1wp17I h1wp17J h1wp18h h1wp18I h1wp18J;
run;

OVERVIEW
Running the program to display the tables was pretty straight forward. The two difficulties that i encountered were (i.) figuring out what data was not relevant and (ii.) figuring out how to recoding a couple of the variables to make the easier to understand. I will explain - 

Most of the data in the Adolescent Health study is categorical, and follows very specific clustering of responses. Therefore, once i figured out, using the code book, what the codes were for non responses (refused to give one or did not know the answer) and non-relevant responses (there is no Mom so how can they reply) i was able to remove them (indicated in each chart with Frequency Missing value). 

In other questions that responses were giving on a scale, usually 1 (being low) and 5 (being high). For my review i wanted to know more clear cut answers - did they think it or not. As you can see in the first two questions about if the respondent wants to go to college (H1EE1) and believes they will go (H1EE2), i was able to create new charts that group together those responding with low (responses of 1 and 2), medium (3) or high (4 and 5). I believe that organizing the data like this will allow me to more quickly show the connection between college going believe and parental history. 

CHARTS

No comments:

Post a Comment