SAS® Program Efficiency, Large Datasets and System Options     

          Training Videos:

Lost without SAS Indexes?  


Basic simple program efficiency steps as well as more complex strategies and techniques are included in this section.  Before implementing an upgrade, make sure to benchmark measurements.  In addition, SAS® system option papers are also included.  See also SAS Business Intelligence and ETL, SAS Version 9.2 for hashing, SAS Certification exam and Proc SQL. See Compare and Conquer SAS Programming Techniques.  

 

Before deciding to use SAS Indexes, it makes sense to evaluate the advantages and disadvantages. In general, over 80% of SAS users do not use SAS indexes.  Keep in mind that SAS Indexes are for very specialized purpose and come with some conditions and at a price.  You must first know your data very well and the types and quantity of daily data transactions to make a decision. 

 
USE SAS Indexes if .. (See Basic paper, FAQ paper)


 DO NOT Use SAS Indexes if ...

1. You want to reduce processing time by accessing one dataset using the direct access method instead of the normal sequential access method.  

Indexes save time by allowing SAS to read starting from the middle of the dataset instead of starting from the first record and reading the entire dataset when returning matching conditions.
 

1. You do not want to maintain SAS indexes since using indexes creates a separate file which gets deleted if the dataset is recreated without recreating the index.


2. You plan to index on few key variables in one dataset which are used for subsetting records, for example.  

Simple index
is indexing one variable, which is the most popular, and complex index is two or more variables, where more than two variables is rare.  Datasets can have multiple simple and complex indexes.  Simple index files are the same name as the key variable and complex index file names need to be specified. 


2. You do not have at least 10% of input dataset size as extra storage to save the separate index file which is the dataset name with .SAS7BNDX extension for simple indexes.

3. You expect your data, or indexed variables, to be stable and not change often since this requires recreating the index file.
3. You expect to select more than 25% of all records when subsetting since this makes indexes very inefficient.  Ideally, indexed values will be less than 10% of the records. GENDER, for example, is a poor index variable.


4. You are processing VERY LARGE datasets, such as with over 10,000 records, on repetitive or transactional type processing, grouping, sorting for merging or subsetting datasets.  

Generally indexed variables are using these statements: BY, WHERE, SET/MODIFY and KEY option in DATA steps or SAS Procedures.

4. Your dataset is small since without index will be faster.  Using indexes on small datasets is overkill.
5. You continue to use PROC SORT to run up to 50% faster.  Using SAS Index does not prevent you from using PROC SORT. 
5. If you are using Hash objects since using Hash objects is about 10% faster than indexes.

 6. You are fine with the required ascending sort order of the indexed variables.  Using DESCENDING and NOTSORTED options in PROC SORT will not use indexes. 
6. You plan to use selected SAS Procedures such as PROC APPEND since using indexes is not compatible with these procedures.



 SAS Procedure examples for creating SAS Indexes


Description 

data sales.bighits (index=(cdnumber / unique));

  set olddata.oldhits;

  < additional SAS statements >

run;



1. Using the INDEX DATA set option in a DATA step to create a simple index of CDNUMBER variable with same name file on the SALES.BIGHITS dataset.  

One or more options for creating index include: UNIQUE and NOMISS.  Note that the INDEX DATA set option can also be applied in PROC SORT, for example.  This makes it easier for recreating indexes whenever the dataset sets updated.

data sales.bighits (index=(numname=(cdnumber artistname) / nomiss));

  set olddata.oldhits;

  < additional SAS statements >

run;

2. Using the INDEX DATA set option in a DATA step to create a complex index named NUMNAME which is based two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset.  

One or more options for creating index include: UNIQUE and NOMISS.  Note that the INDEX DATA set option can also be applied in PROC SORT, for example.  This makes it easier for recreating indexes whenever the dataset sets updated.

proc sql;

  create unique index cdnumber on sales.bighits;

quit;
3. Using PROC SQL to create a simple index with the same name file CDNUMBER on the SALES.BIGHITS dataset.  One or more options for creating index include: UNIQUE and NOMISS.

proc sql;

  create index numname on sales.bighits(cdnumber artistname);

quit;

4. Using PROC SQL to create a complex index named NUMNAME which is two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset.  One or more options for creating index include: UNIQUE and NOMISS.

proc datasets library=sales;

  modify bighits;

  index create cdnumber / unique;

run;

5. Using PROC DATASETS to create a simple index with the same name file CDNUMBER on the SALES.BIGHITS dataset.  One or more options for creating index include: UNIQUE and NOMISS.

proc datasets library=sales;

  modify bighits;

  index create numname=(cdnumber artistname) / nomiss;

run;
6. Using PROC DATASETS to create a complex index named NUMNAME which is two variables CDNUMBER ARTISTNAME on the SALES.BIGHITS dataset.  One or more options for creating index include: UNIQUE and NOMISS.

data work.direct;

 do age=13,14; 

  do until (eof);

   set class key=age end=eof;

    if _IORC_=0 then do;  /* 0 indicates a match was found */

     put _all_;

     output;

    end;

    else _ERROR_=0;     /* if no match, reset the error flag and continue */

   end;

 end;

 stop;   /* stops future searching on ages 13 and 14 have been found */

run;

 7. Using the DATA Step to use a simple index with the KEY= option for direct access method of the AGE key variable.  SAS automatically knows an simple index exits on AGE.  For composite index, such as COUNTRY_STATE, SAS can select records based on two or more variables.  This is similar to using the POINT= option with record #.

data work.driver;

 age=13; output;

 age=14; output;

run;

data work.direct;

 set work.driver; /* <- sequential access & implicit loop */

 do until (eof); /* <- explicit loop */

 set work.class key=age end=eof; /* <- direct access */

 if _IORC_=0 then do; /* 0 indicates a match was found */

 put _all_;

 output;

end;

else _ERROR_=0; /* if no match, reset the error flag and continue */

end;

run;

 8. Using DATA Step to use a simple index with the KEY= option with values in another dataset using direct access method of the AGE key variable.
 data illinois;
  set indexlib.prodindx (idxname=state);
where state eq 'Illinois' and product in ('BED' 'DESK'); run; 
 9. User has option to override SAS in using the default index using the IDXNAME= DATA set option.  In WHERE statements, users also have the option to force or turn off using SAS index with the IDXWHERE= option.



Balance of Efficiency Resources 

Reducing CPU often results in reducing Data Storage and Input/Output but may increase Memory usage

80% improvements can be expected with simple programming techniques

20% improvements can be expected with fine tuning, optimal combination of resources, index, system options, hardware upgrades, datawarehouse strategies such as summary-level datasets

 
Programming Techniques - using Proc SQL for multi-tasking or using PROC DATASETS to modify data set structure

CPU Time - using KEEP or DROP to limit number of variables or PROC SORT options such as SORTEDBY= and PRESORTED options

Data Storage - using LENGTH to reduce variable size or SAS View instead of a SAS Dataset

Input/Output - using WHERE data set option to limit number of records

Memory - reading only the records, variables or datasets required


DATA Step vs. Proc SQL Differences

In general, simple merges are faster with DATA Steps and complex merges are faster with Proc SQL.  This is because DATA Step uses sequential reads while Proc SQL first creates a Cartesian Product and then subsets the dataset.

Dataset Factors - Size, Static, Relationships between datasets, Density of Matches, and Outside SAS Access.

Operating System - Windows and Unix (generally similar), MVS and VMS (DATA step is generally faster).

One-Stop Reporting - Proc SQL is faster than 2 Proc Sorts, DATA merge, 1 Proc Sort and 1 Proc Print. 

 Improve Proc SQL by up to 2X - When joining tables, process pre-sorted tables as compared to processing unsorted tables.

Animated Guide Speed Merges: resource use by common procedures, Russ Lavery

______________________________________________

Proc SQL Advantages 

Multi-tasking and sorting with one pass instead of multiple passes

SELECT name, sex                 (Memory) FROM sashelp.class           (CPU with SORTEDBY= option when joining tables)      
WHERE sex = 'F'                    (Input/Output)
GROUP BY sex                       (CPU)
HAVING weight > avg(weight)  (Input/Output)
ORDER BY name;                    (CPU)

SQL Pass-Through for native database processing 

Validate syntax before executing

Better programming style with concise and self-documenting code

(SORTEDBY = ) dataset option to take prevent re-sorting dataset

Utilize dictionary tables to access summary level data

______________________________________________

SAS® Paper Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding 

SAS® Savvy Resource SAS Programming Efficiency Factors Summary Sheet

Compare Program Efficiency with DATA Step

SAS System Options

Working the System: Our Best SAS Options, Patrick Thornton, Luliana Barbalau

Save/Retrieve SAS Systems Options Are Your Friends, Edward Heaton


 Beginner SAS Programmer


Beginner Paper and Survey SAS® Performance Tuning Strategies and Techniques, Kirk Lafler

The How and When of Efficient Programming, Frank DiIorio [Presentation]

SAS® Index Paper The Basics of Using SAS Indexes, Michael Raithel [Presentation]


 Advanced SAS Programmer


Advanced Parallel Process Techniques Multi-Process Connect Paper, John Cohen

______________________________________________

System AutoExec, Setup.SAS file, Config file and Batch processing

Changing SAS System Option Settings

1. USING AUTOEXEC.SAS TO CUSTOMIZE SAS SESSIONS, Jennifer Price

2. %SETUP – A Utility Macro to Build Directory Structures and a SAS Initialization Program, Guowei Wu [Macro]

3. A SASautos Companion: Reusing Macros, Ronald Fehd

4. Run All Your SAS® Programs in One Program: Automatically, Ling Y. Chen, Steven A. Gilbert

5. Batch Processing with SAS®: Beyond Running Programs Overnight, Lisa Sanbonmatsu

6. More Than Batch – A Production SAS® Framework, Denis Cogswell

7. Interactive SAS® Mode vs. Batch Mode, Irina Walsh

8. Double-Clicking a SAS® File: What Happens Next?, Sandy Gibbs, Michele Mosca, and Donna Bennett

9. Using SASv9.cfg, autoexec.sas, SAS Registry, and Options to Set Up Base SAS®, Peter Eberhardt, Mengting Wang

10. How Do I …. ? Some Beginners FAQs, Peter Eberhardt, Audrey Yeo

11. SAS® System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall

12. An Autoexec Companion, Allocating Location Names during Startup, Ronald Fehd

______________________________________________

General Papers

1. SAS Program Efficiency for Beginners, Bruce Gilsen

2. Summarizing Data with Base SAS® PROCs, Debbie Buck

3. TIPS - Techniques and Information for Programming in SAS, Kathleen Harkins, Carolyn Maass, and Mary Rutkowski

4. The Most Important Efficiency Techniques, Bob Virgile

5. Efficiency Considerations Using the SAS® System, Rick Langston

6. Efficiency Techniques for Accessing Large Data Files, By AndrewWilcox

7. Efficiency Techniques for Beginning PROC SQL Users, Kirk Lafler

8.The DATA Statement: Efficiency Techniques, S. David Riba

9. Efficient Techniques and Tips in Handling Large Datasets, Jason Shilong Kuang

10. SAS Macros for Large Scale Data Analysis and Quality Management of Corporate Actuarial Data Mart, Dennis Tang, Don Cooper

11. Make Your Life a Little Easier: A Collection of SAS Macro Utilities, Pete Lund [NONOTES]

12. Efficient SAS Coding Techniques: Measuring the Effectiveness of Intuitive Approaches, Frank C. Dilorio

13. Keeping Your Data in Step - Utilizing Efficiencies, Michael Sadof

14. SAS® Macro Autocall and %Include, Jie Huang and Tracy Lin

15. Think FAST! Use Memory Tables (Hashing) for Faster Merging, Gregg Snell [Index]

16. Don't Be a Litterbug: Best Practices for Using Temporary Files in SAS, Richard Langston

17. WHATlTDO: DOCUMENTATION AUTOMATICALLY PRODUCED BY YOUR SAS PROGRAM, Karen Crandall

18. Run All Your SAS® Programs in One Program: Automatically, Ling Y. Chen, Steven A. Gilbert

19. Efficient Statistical Programming? – These Small Steps Make the Difference, Keiko Powers

20. Building Effective SAS® Indexes using Short and Distinct Keys, Covered Queries and Clustered Indexes, Kirk Paul Lafler

21. What’s In A Name; Describing SAS File Types, Randall Cates [Index]

22. How Many Observations Are In My Data Set?, Jack Hamilton

23. Keep it Clean, Cecilia Mauldin

24. Merge vs. Join vs. Hash Objects: A Comparison using "Big" Medical Device Data, James Johnson

25. Secrets of Efficient SAS® Coding Techniques, Andrew Kuligowski, Swati Agarwal

26. GAINING EFFICIENCY WITH SAS® SOFTWARE, Kirk Lafler

27. Efficient SAS Coding Techniques: Measuring the Effectiveness of Intuitive Approaches, Frank Dilorio

28. Proper Housekeeping – Developing the Perfect “Maid” to Clean your SAS® Environment, Chuck Bininger

29. An Overview of Syntax Check Mode and Why it is Important, Thomas Billings

30. SAS Statements Used Anywhere, Karin LaPann [Duplicate Libnames]

31. SAS® and Open Source: Two Integrated Worlds, Jesse Luebbert and Radhikha Myneni

32. Copy That! Using SAS® to Create Directories and Duplicate Files, Nicole Ciaccia

33. Simple Techniques to Improve the Performance of your SAS® Programs When Processing Large Data Sets, Steve Cavill

34. Proper Housekeeping – Developing the Perfect “Maid” to Clean your SAS® Environment, Chuck Bininger

35. Solving SAS Performance Problems: Our Methodology, Jim Kuell

36. Using SAS® Job Monitor to Analyze SAS Data Integration Studio Job Performance, Jeff Dyson

37. Checking and Tracking SAS Programs Using SAS Software, Keith Gregg, Yefim Gershteyn

38. Slim Down Your Data, Mickael Borne

SAS Indexes

1. Frequently Asked Questions about SAS® Indexes, Billy Clifford

2. Creating and Exploiting SAS Indexes, Michael Raithel

3. A Faster Index for Sorted SAS® Datasets, Mark Keintz

4. Using SAS Indexes with Large Databases, Alex Vinokurov, Lawrence Helbers

5. CRAFTING YOUR OWN INDEX: WHY, WHEN, HOW, Paul M. Dorfman, Lessia S. Shajenko

6. Add a Little Magic to Your Joins, Kirk Paul Lafler

7. Power Indexing: A Guide to Using Indexes Effectively in Nashville Releases, Diane Olson [IDXWHERE]


SAS® Savvy Class:Maximizing Productivity and Efficiency using Proc SQL

Powered by Wild Apricot Membership Software