data:image/s3,"s3://crabby-images/953d6/953d6e8ff1663f8ff8cbc7f28ab63529d3752050" alt=""
46 Autopartitioning Techniques in SAS/ACCESS Chapter 6
Override autopartitioning with DBSLICE= if you can manually provide
substantially better partitioning. The best partitioning equally distributes the
result set across the threads.
Consult the DBMS-specific section of this documentation for information and tips
concerning your specific DBMS.
Threaded reads are most effective on new, faster computer hardware running SAS,
and with a powerful parallel edition of the DBMS. For example, if SAS runs on a fast
uniprocessor or on a multiprocessor machine and your DBMS runs on a high-end SMP
server, you will receive substantial performance gains. However you receive minimal
gains or even performance degradation when running SAS on an old desktop model
with a non-parallel DBMS edition running on old hardware.
Autopartitioning Techniques in SAS/ACCESS
SAS/ACCESS products share an autopartitioning scheme based on the MOD
function. Some products support additional techniques. For example, if your Oracle
tables are physically partitioned in the DBMS, the SAS/ACCESS interface to Oracle
automatically partitions in accordance with Oracle physical partitions rather than
using MOD. The SAS/ACCESS interface to Teradata uses FastExporting, if available,
which enables the FastExport utility to direct the partitioning.
MOD is a mathematical function that produces the remainder of a division operation.
Your DBMS table must contain a column to which SAS can apply the MOD function —
a numeric column constrained to integral values. DBMS integer and small integer
columns suit this purpose. Integral decimal (numeric) type columns can work as well.
On each thread, SAS appends a WHERE clause to your SQL that uses the MOD
function with the numeric column to create a subset of the result set. Combined, these
subsets add up to exactly the result set for your original single SQL statement.
For example, assume your original SAS-produced SQL is
SELECT CHR1, CHR2 FROM
DBTAB
and that table Dbtab contains integer column IntCol. SAS creates two threads
and issues:
SELECT CHR1, CHR2 FROM DBTAB WHERE (MOD(INTCOL,2)=0)
and
SELECT CHR1, CHR2 FROM DBTAB WHERE (MOD(INTCOL,2)=1)
Rows with an even value for IntCol are retrieved by the first thread. Rows with an odd
value for IntCol are retrieved by the second thread. Distribution of rows across the two
threads is optimal if IntCol has a 50/50 distribution of even and odd values.
SAS modifies the SQL for columns containing negative integers, for nullable
columns, and to combine SAS WHERE clauses with the partitioning WHERE clauses.
SAS can also run more than two threads. You use the second parameter of the
DBSLICEPARM= option to increase the number of threads.
The success of this technique depends on the distribution of the values in the chosen
integral column. Without knowledge of the distribution, your SAS/ACCESS product
attempts to pick the best possible column. For example, indexed columns are given
preference for some DBMSs. However, column selection is more or less a guess, and the
SAS guess might cause poor distribution of the result set across the threads. If no
suitable numeric column is found, MOD cannot be used at all, and threaded reads will
not occur if your SAS/ACCESS product has no other partitioning technique. For these
reasons, you should explore autopartitioning particulars for your DBMS and judiciously