Monday, March 19, 2012

SAS Proc SQL Pass-through Connection Strings

I work in the health care research department of a managed health care organization in the Southeastern U.S.  SAS is a mainstay in the industry, and this is certainly true for my current work environment.  For the analysts in my group, SAS is the data manipulation and analysis tool.  The primary data repository that I manage exists in the form of SAS datasets.

However, much of the source system data from which the research data warehouse is built lives on other platforms, so I do a lot of ETL'ing from Teradata, Oracle, Sybase, and MSSQL into SAS.  For small stuff, like dimension/lookup tables, using libref access to move data into SAS is acceptable, performance-wise.  For example...

libname o oracle user=&oracle_uid password="&oracle_pwd"
                 path='oracle_path' schema='oracle_schema';
proc sql;

    create table sas_copy_of_table as
        select * from o.table;
quit;


But for larger pulls, especially having a where clause, libref access can be slow since SAS may choose to pull the entire data table over the network before applying the where clause.

(Aside: use the SASTRACE and SASTRACELOC system options to see if SAS is optimizing cross-platform database queries behind the scenes.)

The alternative is to use SAS pass-through queries.  These are queries written in the database's flavor of SQL (instead of the SAS flavor of SQL) that are passed off from SAS to the other database engine.  Only the final where-filtered results are returned to SAS.
 
There are two areas in which I've seen lots of heartache writing and debugging SQL pass-through queries in SAS.  First, getting the connection string just right, given how finicky SAS is, can be painful.  That is the focus of the rest of this post.  The other curse-inducing area is getting the right database-compliant (NOT SAS-compliant!) syntax into the query itself (e.g., employing Oracle--not SAS--functions and operators for date comparisons).  I plan to discuss this issue in a future post.

Without further ado, here are examples for Oracle, Sybase, Teradata, MSSQL (via OLEDB), and MSSQL (via ODBC).  Assume the *_uid and *_pwd macro variables (e.g., oracle_uid and oracle_pwd) contain database credentials and were declared in an include file prior to the code shown.  Each proc sql block creates a new work dataset called sample by selecting the top 10 observations from a database table called dbtable.

proc sql;
    connect to oracle (user=&oracle_uid

                       password="&oracle_pwd"
                       path='oracle_path');
    create table sample as select * from connection to oracle (
        select * from the_schema.dbtable where rownum<11);

quit;

proc sql;
    connect to sybase (user=&sybase_uid

                       password="&sybase_pwd"
                       server=syb_server
                       database=syb_db);
    create table sample as select * from connection to sybase (
        select top 10 * from dbtable);
quit;

proc sql;
    connect to teradata (user=&teradata_uid

                         password="&teradata_pwd"
                         tdpid='teradata_server');
    create table sample as select * from connection to teradata (
        select top 10 * from dbtable);
quit;

proc sql;
    connect to oledb (init_string="User ID=&mssql_uid;Password=&mssql_pwd;Provider=SQLOLEDB.1;Data Source=mssql_server;Initial Catalog=mssql_db");    

    create table sample as select * from connection to oledb (
        select top 10 * from the_schema.dbtable); 

quit;

proc sql; *SQL 2008;
    connect to odbc (noprompt="Uid=&mssql_uid;Pwd=&mssql_pwd;Driver={SQL Server Native Client 10.0};Server=mssql_server;Database=mssql_database");    

    create table sample as select * from connection to odbc (
        select top 10 * from the_schema.dbtable);

quit;

The website connectionstrings.com is a great help in figuring out driver names, etc.  Also, if you are unsure what SAS database engines you have installed, run proc setinit; run; and look in the log for entries like "SAS/ACCESS Interface to DB2," "SAS/ACCESS Interface to ORACLE" and so on.

2 comments:

  1. This technical post helps me to improve my skills set, thanks for this wonder article I expect your upcoming blog, so keep sharing..
    Regards,
    sas training institute in Chennai|sas training chennai

    ReplyDelete

  2. The information you have given here is truly helpful to me. CCNA- It’s a certification program based on routing & switching for starting level network engineers that helps improve your investment in knowledge of networking & increase the value of employer’s network
    Regards,
    ccna courses in Chennai|ccna institutes in Chennai

    ReplyDelete