Monday, April 23, 2012

Custom date formats in SAS

Let's suppose you have a dataset with a range of dates spread out over a 10-year period.  This data step generates a fake dataset with 1000 obs spread out over 01JAN2000 to near the end of 2009 (a spread of 3,650 days).

data fakedata;
    format date mmddyy10.;
    do i = 1 to 1000;
        date = int(365 * 10 * ranuni(12345)) + '01jan2000'd;
        output;
    end;  drop i;
run;

To view the distribution of values, you would likely do a proc freq like this:

proc freq data=fakedata;
    tables date;
    format date year4.;
run;

                                 Cumulative    Cumulative
DATE    Frequency     Percent     Frequency      Percent
---------------------------------------------------------
2000          86        8.60            86         8.60
2001          89        8.90           175        17.50
2002          83        8.30           258        25.80
2003         103       10.30           361        36.10
2004         118       11.80           479        47.90
2005         102       10.20           581        58.10
2006         115       11.50           696        69.60
2007          94        9.40           790        79.00
2008         108       10.80           898        89.80
2009         102       10.20          1000       100.00

But what if you want a more detailed breakdown of more recent dates?  Let's get a fancy and say you are fine with yearly counts for 2000-2005, but you want to see quarterly counts for 2006-2008 and monthly counts for 2009.  Without futzing with the data itself (e.g., creating a new string variable with formatted values of the DATE variable), you can get the job done easily by creating a custom format and applying that in a proc freq statement.  The proc format statement looks like this:

proc format;
    value mydtfmt low          - '31dec2005'd = [year4]
                  '01jan2006'd - '31dec2008'd = [yyq6]
                  '01jan2009'd - high         = [yymms7];
run;

This is saying:
  • for dates on or before 12/31/05, format as 4-digit years (e.g., "2003"), 
  • for dates from 1/1/06 to 12/31/08 format as 4-digit years plus quarter number separated by a letter Q (e.g., "2007Q3"), and 
  • for dates on or after 1/1/09 format as 4-digit years plus 2-digit month separated by a slash (e.g., "2009/07").  
A proc freq statement that uses this format (plus its output) looks like this:

proc freq data=fakedata;
    tables date;
    format date mydtfmt.;
 run;

                                    Cumulative    Cumulative
   DATE    Frequency     Percent     Frequency      Percent
------------------------------------------------------------
2000             86        8.60            86         8.60
2001             89        8.90           175        17.50
2002             83        8.30           258        25.80
2003            103       10.30           361        36.10
2004            118       11.80           479        47.90
2005            102       10.20           581        58.10
2006Q1           24        2.40           605        60.50
2006Q2           34        3.40           639        63.90
2006Q3           21        2.10           660        66.00
2006Q4           36        3.60           696        69.60
2007Q1           21        2.10           717        71.70
2007Q2           24        2.40           741        74.10
2007Q3           18        1.80           759        75.90
2007Q4           31        3.10           790        79.00
2008Q1           28        2.80           818        81.80
2008Q2           40        4.00           858        85.80
2008Q3           26        2.60           884        88.40
2008Q4           14        1.40           898        89.80
2009/01           6        0.60           904        90.40
2009/02          10        1.00           914        91.40
2009/03           5        0.50           919        91.90
2009/04          12        1.20           931        93.10
2009/05          11        1.10           942        94.20
2009/06          11        1.10           953        95.30
2009/07           2        0.20           955        95.50
2009/08           8        0.80           963        96.30
2009/09           8        0.80           971        97.10
2009/10           5        0.50           976        97.60
2009/11          12        1.20           988        98.80
2009/12          12        1.20          1000       100.00

No comments:

Post a Comment