Graduation - Retention Database

University Planning and Analysis

05/07/2002

Overview

The purpose of this revised system is to make improvements to an already sound Attrition Database.  The principle improvements add more information about a student at each time point, in a more efficient storage format, at more time points per year.  There are two files that make up the database, a demographic file where there is only one record per student in a cohort, and a term file where there is one record for each time point in each student’s undergraduate career.

This discussion will assume familiarity with the source datasets and a student’s progress through the R&R system as coded in the data.

Cohorts are created from the SDFX and then merged with other information from the term process to produce demographic dataset and the term dataset.  The term information comes from the “SUPERRAT” files taken at census and at the end of each semester.  The STUSEM started in 1992 and we can go back to 1989 by using the SUPERRAT, therefore SUPERRAT is the choice.  The R&R and the ACS applications development teams assure us that the data is equivalent (the STUSEM just has other data items for more other uses).  To get the DE and Coop program information the STUCRS (back to 1992) and Revised Rolls (pre-1992) must be read for census and end-of-semester.  The MOUSE history file is used to add degrees conferred.

Specific improvements over the ‘old’ system:

  1. Changed database format from the ‘old’ array of time points contained in a single record per student to multiple term records per student, one for each time point.  “Empty” time points do not have a record in the tall stack of records and must be accounted for in report processing logic.
  2. Added official student class status from registration system at each time point.
  3. Added DE and coop flags to cohort record and specific term record as well. This is a cumulative flag that indicates the presence of at least one semester in that status.
  4. Added more time points, there are now 4 census and 4 end-of-semester time points with existing system data captured at every time point.  This does lead to more problems when interpreting data, such as the presence of an end-of-term record with no matching census record, but guarantees that the database shows all data that is present.
  5. Graduation information is recorded from degree files at each time point it occurs, and multiple degrees are all recorded at each time point if present.
  6. Degree information summary is added to cohort record.

Processing Information:

Program library:        k:\student\attrition\new

Central program:       new_atr_0.sas

                                    defines libraries, files, parameters, and calls all processing programs

Report Model:            report2.sas

Process to update database for adding new semesters:

At this time new semesters are added by using a new end date and re-running the entire build process from the beginning.  At some point the begin date can move forward to drop off old semesters.  All files used in entire process must be present on PC hard disk (or network drive) in ascii-text or SAS format.  Carefully place source data and work libraries on opposed disks to achieve maximum efficiency in processing.

The best time to build a new database will be in the middle of a semester after all graduation and end-of-term data is in for the previous semesters.  In the mid-fall run for new retention and graduation rates, the most-current- semester census-data files must also be included even though the history files will not include the end of that semester.  In the mid-spring run there should be no need to include spring census, summer run can be any time after spring graduation data is in, with no need for summer census data.

  1. Download needed base files from mainframe (see below),
  2. Assemble needed base SAS datasets (see below),
  3. Change end date parameter at top of central program,
  4. Submit, (on Lewis’ PC should take about 30 minutes),
  5. Look for problems in added semesters,
  6. Distribute.

Specific datasets required

  1. SSN change file from mainframe database by running P010501.J9550.SOURCLIB (GETSSNEX).  Download is automatic to proper dataset on K drive.  As long as file date is greater than data date, the file is good to go.
  2. Census SUPERRAT look for job in P010501.J9550.SOURCLIB to copy tape superrat to P010501.TEMP.SUPERRAT and then download from disk (SUPERRAT may already be on disk in which case download directly).
  3. History SUPERRAT look for job in P010501.J9550.SOURCLIB to copy tape superrat to P010501.TEMP.SUPERRAT and then download from disk (SUPERRAT may already be on disk in which case download directly).  Note one history SUPERRAT will have multiple semesters in it by the end of the year.
  4. MOUSE history look for job in P010501.J9550.SOURCLIB to copy tape mouse to P010501.TEMP.MOUSE and then download from disk (MOUSE may already be on disk in which case download directly).
  5. Census STUCRSlook for job in P010501.J9550.SOURCLIB to copy tape STUCRS to P010501.TEMP. STUCRS and then download from disk (STUCRS may already be on disk in which case download directly).
  6. History STUCRS and / or GRADES file these should already be accessible on PC in SAS format, if not look for job in P010501.J9550.SOURCLIB to copy tape STUCRS to P010501.TEMP. STUCRS and then download from disk (STUCRS may already be on disk in which case download directly).
  7. SDFX files census for each semester should already be in SAS format waiting.
  8. Note for all these files: the ASCII text will be read in from SUPERRATs and MOUSE files, while the SDFX, STUCRS, and GRADES files must be in SAS dataset format.

Check For Problems in Processing Outputs:

  1. Proc freq from each semester should have non-zero counts.  Check the academic performance flags with abbreviations in database to see if a new flag was added.  If so then code in a new flag IF statement and re-run database.
  2. Proc freq of each MOUSE can be checked for zeroes.
  3. The list of student Ids with no term records needs to be checked.  There are many records in the list now, which tells me the change to new ID is in error.  These are checked against the change SSN source file and put out into SSNCHECK3.TXT file.  Take changes and put correct adjustments into SSNCHECK4.TXT file which will feed back into the merge step.
  4. List of students in more than one cohort needs to be checked.  If there is an error condition, the record in error can be deleted in the MERGE_FINAL.SAS program around statement 30 in the change SSN re-processing.  There are legitimate reasons to be in multiple cohorts. See programming notes below.

Programming notes:

  1. At the current time, rebuilding the whole database is cheaper than making a routine just to add a new semester(s) on the end of the existing.  The old database will have to be processed for SSN changes and resorted anyway, and there may be other issues to address as well (such as adding data variables, or changing length of existing variables).
  2. There is no attempt to put in a DE cohort, or load term records for those in a DE cohort.  This would require a lot of assumptions in the datasets, such as how to construct old cohorts, and how to pull in census information not present in the TRACS system until after RT census.
  3. Note that SSN changes are stored as SAS datasets that are read into a format at each semester processing.  When running the build process, this is about half the time span of processing.  If re-running the process in the same SAS session, then the ‘adjust_ssns’ step may be bypassed (commented out in _0 module) and existing SSN changes will work.  Typically programming changes in RAT, Mouse, or SDFX modules are made, so rerun will be much faster when SSN processing is skipped.
  4. Adding a field to all records means going to the KEEP statement and including that field, both the ‘read_rat_cens.sas’ and ‘read_mouse.sas’ output fields must match (the mouse data is appended with FORCE specified so make sure there are no extra fields in the MOUSE process, or that numeric / character mismatches do not occur).  Note that the KEEP statements are in the SET part of the data step, and changes must be coordinated for all places (rat census, rat history, mouse history).
  5. RAT processing is done with a common set of code in a macro loaded in PREPS module.  Date of birth is hardwired to 20th century, COM college conversion is hardwired in, etc.
  6. DE and COOP process are done with a common set of code in a macro loaded in PREPS module.  Additions here must work for both census and history time points.

Next Steps:

  1. At this time there are about 6 students who seem to be legitimately placed in 2 cohorts.  The current system structure ignores this possibility and the ‘extra’ record will be removed at reporting time with a nodupkey sort option.  This will place the student in the first entry cohort and compute time accordingly (incorrectly).  The correct way to fix this is to add a record ID to the end of the student id that would identify the second and subsequent cohorts.  This record ID would be 1 for first cohorts, the second cohort would be identified with a 2, and so on.  The term file build logic would have to be smart enough to decide when a second cohort entry was started and to increment the record ID for all term records after that new cohort membership started.
  2. Add data field = GPA in major at history time point in fall and spring from SUPEREXT files.

Report Printing:

  1. Program named ‘report2.sas’ in home directory produces a report equivalent to the old upside-down stairstep report.
  2. Comparison with attrition database is part of process. The status of each student is compared semester by semester and exceptions are saved to a dataset.
  3. The report process needs to do more work than the old report, such as taking care of dirty data conditions that exist in the data. An example is where a student registers for the semester after the census and so will be missing an 'C' record, but will have an 'E' record for the history of that semester. Other dirty data conditions are where a student takes 5-week and 10-week courses in the same summer season.

Output from SAS PROC CONTENTS of variables, attributres, and labels

Cohort Dataset

Cohort and Term Dataset Contents   10:08 Tuesday, April 30, 2002   1

The CONTENTS Procedure

Data Set Name: NEWATR.COHORT                          Observations:         67603

Member Type:   DATA                                   Variables:            42  

Engine:        V8                                     Indexes:              0   

Created:       10:26 Tuesday, April 30, 2002          Observation Length:   168 

Last Modified: 10:26 Tuesday, April 30, 2002          Deleted Observations: 0   

Protection:                                           Compressed:           NO  

Data Set Type:                                        Sorted:               NO  

Label:                                                                          

         -----Engine/Host Dependent Information-----

Data Set Page Size:         16384                            

Number of Data Set Pages:   698                             

First Data Page:            1                               

Max Obs per Page:           97                              

Obs in First Data Page:     64                              

Number of Data Set Repairs: 0                               

File Name:                  d:\student\newatr\cohort.sas7bdat

Release Created:            8.0202M0                        

Host Created:               WIN_PRO                         


Cohort Variables

Cohort and Term Dataset Contents                                                        10:08 Tuesday, April 30, 2002   2

The CONTENTS Procedure

         -----Variables Ordered by Position-----

 

 #    Variable    Type    Len    Label

 1    SSN         Char      9    Student ID            

 2    SENRSTAT    Char      1    Enroll Status         

 3    SRESID      Char      3    GA Res Code           

 4    SRACE       Char      1    Race                  

 5    SSEX        Char      1    Sex                    

 6    SCITIZ      Char      1    Citizenship           

 7    SDEGI       Char      1    Degree Intent         

 8    STRINST     Char      6    Transfer Inst         

 9    SSATV       Char      3    SAT Verbal            

10    SSATM       Char      3    SAT Math              

11    SHSPCTL     Char      2    Class Rank            

12    SHSGPA      Char      3    Hign School GPA       

13    SUPGPA      Char      3    Predicted GPA         

14    SHSFLAG     Char      1    HS Grad Yr Flag        

15    SCEEBA      Char      6    HS CEEB Code          

16    SCOLL       Char      2    College               

17    SCLASS      Char      2    Class                 

18    SCUR        Char      5    Curriculum            

19    SNAME       Char     23    Student Name          

20    SRES        Char      3    NCSU Res Code         

21    SHSRANK     Num       8    HS Rank               

22    SHSIZE      Num       8    HS Class Size         

23    dobym       Char      6    Date of birth, YYYYMM 

24    yrsem       Char      5    Cohort semester       

25    matdate     Char      5    SDFX Matriculation Date

26    scoop       Char      1    Cooperative Ed student?

27    sde         Char      1    Distance Ed student?  

28    dcoll01     Char      2    Degree 1 college      

29    dcoll02     Char      2    Degree 2 college      

30    dcoll03     Char      2                          

31    dcoll04     Char      2                          

32    dcoll05     Char      2                           

33    dcurr01     Char      3    Degree 1 curriculum   

34    dcurr02     Char      3    Degree 2 curriculum   

35    dcurr03     Char      3                          

36    dcurr04     Char      3                          

37    dcurr05     Char      3                          

38    dsem01      Char      5    Degree 1 semester     

39    dsem02      Char      5    Degree 2 semester     

40    dsem03      Char      5                          

41    dsem04      Char      5                           

42    dsem05      Char      5                          


Term Dataset

Cohort and Term Dataset Contents                                                        10:08 Tuesday, April 30, 2002   3

The CONTENTS Procedure

Data Set Name: NEWATR.TERM                            Observations:         1047295

Member Type:   DATA                                   Variables:            23    

Engine:        V8                                     Indexes:              0     

Created:       10:26 Tuesday, April 30, 2002          Observation Length:   88    

Last Modified: 10:26 Tuesday, April 30, 2002          Deleted Observations: 0     

Protection:                                           Compressed:           NO    

Data Set Type:                                        Sorted:               NO    

Label:                                                                            

        -----Engine/Host Dependent Information-----

Data Set Page Size:         8192                          

Number of Data Set Pages:   11385                         

First Data Page:            1                             

Max Obs per Page:           92                            

Obs in First Data Page:     51                            

Number of Data Set Repairs: 0                              

File Name:                  d:\student\newatr\term.sas7bdat

Release Created:            8.0202M0                      

Host Created:               WIN_PRO                       

Term Variables

           -----Variables Ordered by Position-----

 

 #    Variable      Type    Len    Label

 1    SSN           Char      9    Student ID              

 2    rectype       Char      1    Time Point, E,C,G       

 3    ssemhr        Num       4    Semester hours          

 4    scumhrsp      Num       4    Cumulative hours passed 

 5    ssemhrsp      Num       4    Semester hours passed   

 6    ssemhrsf      Num       4    Semester hours failed   

 7    stranshr      Num       4    Transfer hours          

 8    senrst        Char      1    Enroll status           

 9    sschool       Char      2    College                 

10    sclass        Char      2    Class                   

11    scurr         Char      3    Curriculum              

12    stuires       Char      3    Tuition residence (NCSU)

13    scumgpa       Num       8    Cumulative GPA          

14    ssemgpa       Num       8    Semester GPA            

15    shouse        Char      1    Housing code            

16    sregst        Char      1    Registration status     

17    sdblmaj       Char      5    Double major            

18    acaperf       Char      3    Academic performance flag

19    yrsem         Char      5    Cohort semester         

20    scoop         Char      1    Cooperative Ed student?  

21    sde           Char      1    Distance Ed student?    

22    sdehrs        Num       8    DE hours this term      

23    sdecourses    Num       4    DE courses this term