P E R S P E C T I V E S auditing Myths and Inconvenient Truths about

Audit Sampling

An Audit Partner’s Perspective

By Howard Sibelman A udit sampling, like any powerful tool, is of great value when used properly but of great potential harm when used improperly. The auditing

standards provided by the AICPA, PCAOB, and International Auditing

and Assurance Board (IAASB) provide scant guidance about audit sampling. The AICPA’s audit sampling guide is useful, but I believe that more guidance is needed.

The following discussion will not

dwell upon “attributes” sampling (used

principally to test the effectiveness of

internal controls) or the details of how

to use the various sampling tools available for testing amounts or balances.

Instead, it will focus on which sampling tools to use in a variety of situations and why I think each is the proper tool for the situation described. Setting the Stage Auditors audit financial statements

to provide users with assurance that the

financial statements are not materially

misstated. They employ a variety of

testing techniques to accumulate sufficient appropriate audit evidence in

order to support an opinion on the

financial statements: inquiry, observation, reperformance, confirmation, analytical procedures, tests of internal control, and tests of details. Some of

these techniques are nonsampling

procedures and others are sampling

procedures, which include statistical

sampling and nonstatistical sampling.

Audit sampling figures in many popular myths that are not quite true, as

shown in the sidebar, Myths and Truths

About Audit Sampling.

Of these techniques and sampling

procedures, how do we decide which

tools to use? 6 Nonsampling Many tests are not samples. One of

the most common is to “select all

items > x” for testing and either ignore

the remainder of the items or test “a

few” of the items < x. This approach

can be effective for testing highly

skewed populations, because the > x

items make up a very large percentage

of the population, rendering the < x

items immaterial in the aggregate; however, such an approach is not necessarily the most efficient, because the

number of items that must be tested to

reduce the aggregate untested population to an immaterial amount might be

more than would be tested in a statistical sample.

Another nonsampling procedure is a

substantive analytical procedure. Such

procedures can be both effective and

efficient in many scenarios where the

item being tested has a known relationship to some other item that has

already been tested (e.g., commission

expenses have a predictable relationship to sales). Nonstatistical Sampling Any sample where the sample items

are not selected according to the laws

of chance—that is, by probability sampling—is a nonstatistical sample.

Selecting an arbitrary number of

items for testing is an example of

nonstatistical sampling. The problems

with nonstatistical sampling are—

n while the test provides evidence

about the items tested, the results cannot be projected to the untested population, and

n it is easy to inadvertently bias the

sample solution.

APRIL 2014 / THE CPA JOURNAL The auditing literature specifies that nonstatistical sampling is acceptable—but

one might ask why. The literature also

specifies that nonstatistical sample sizes

should approximate statistical sample sizes.

If one needs to do the same amount of

work in either scenario, why choose a

tool (nonstatistical sampling) that does

not permit conclusions about the untested

population? From my point of view, unless

the number of items in the population is

very small (fewer than 200), statistical sampling should always be preferred over nonstatistical sampling. Statistical Sampling Statistical samples include those selected

and evaluated using proper statistical methodology, based upon either an equal probability or a probability proportional to size. Such

samples include the following.

Monetary unit sampling (MUS). This

approach selects samples with a probability proportional to their size. This is

the tool to use when the concern is

overstatement. The most common uses,

in my experience, are to test for the

existence of accounts receivable, inventory, and fixed asset additions, and to

search for unrecorded liabilities (completeness test) by testing the population

of subsequent disbursements for existence

(overstatement)—that is, expenditures

that should have been recorded in the

period under audit.

As noted in the sidebar, I believe MUS

is a relatively easy and highly efficient tool

to use when testing for overstatement, and

this is often the principal audit concern; however, “principal” does not mean “exclusive,”

and so auditors must think about assertions

where understatement may be as significant

a concern as overstatement.

Equal probability sampling. This tool

is frequently referred to as classical variables sampling (CVS). When should auditors be as concerned about understatement

as overstatement? There is no correct

answer to this question other than

“always,” because material misstatement

is a bidirectional concern. Nevertheless,

there are certain realities to consider when

evaluating the risk of material understatement: APRIL 2014 / THE CPA JOURNAL Many entities are motivated to understate results. Family-owned entities may

understate to reduce taxes, but not so much

so as to inhibit lenders from extending

credit. Not-for-profit entities may understate so as to encourage donations.

n Some accounts are prone to both inadvertent overstatement and understatement

misstatement (e.g., inventory valuation—

pricing misstatement)

In the first scenario, how hard can the

auditor look for understatement? In the real

world, client relations come into the picture. Nevertheless, auditors generally perform a variety of nonsampling procedures

that address the potential for understatement in one way or another, such as a revenue cutoff test, inventory observation

(assuming that all of the inventory locations are known), and substantive analytical

procedures that are just as likely to indicate

understatement as overstatement (e.g., the

reasonableness of depreciation, inventory

n turnover, changes in cost of goods sold).

The second scenario is another matter.

Putting aside deliberate understatement,

client personnel and systems might be

just as prone to understating amounts as

overstating amounts. Inventory valuation

is a classic example because of the possibility of inputting errors (e.g., a misplaced

decimal point) or making a mistake while

doing an inventory count (e.g., units of

measure, incorrect arithmetic, overlooked

quantities). Once auditors conclude that

they must test for understatement because

of the assessment of the possibility of material understatement, MUS goes out the window as a sampling tool. The reason is probably evident—because MUS selection is

weighted toward high-value items, the

selection of understated items would be

mere happenstance.

The only valid method to address the

understatement concern is to use CVS. In

CVS tests, items in the population have an MYTHS AND TRUTHS

ABOUT AUDIT SAMPLING

Myths Truths n n This is a 100% audit. n Errors in data are nonexistent, or so

rare that it is a waste of time trying to

detect any errors. n n Understated = conservative, therefore

auditors don’t need to worry about

understatement. n n MUS is an appropriate tool when

concerned about both over- and

understatement. n n n Extrapolating the error rate from a

nonstatistical sample is a valid means of

estimating error in the untested population. Virtually all audits involve sampling. Stuff happens. Humans are not

perfect and computer systems are

designed by humans. Risk assessment

drives the scope, but one cannot audit

based on an assumption that errors do

not exist.

In many scenarios, auditors are mostly

concerned about overstatement and little

concerned with understatement.

Testing for overstatement is relatively

easy (MUS); testing for understatement

is relatively difficult (CVS).

Statistical sampling results take into

account sampling risk, and can be used

as a proposed adjusting journal entry. 7 equal probability of selection (i.e., selection is random), so an understated item has

as much chance of being selected for testing as an overstated item (assuming understatements and overstatements occur at the

same rate in the population). The problem

is that CVS is a much more complicated

tool to use than MUS and requires larger

sample sizes. One of the reasons for larger sample sizes is that—as opposed to

MUS, which looks at high-value items that

are easy to see for overstatement—CVS

must consider all items, because the concern is understatement and understated

items do not “stick out.” Two Practical Approaches The following are what I consider to be

two practical approaches that balance a concern for understatement with the need to be

efficient. The context is testing inventory valuation, and so the right tool is CVS. Readers

will immediately note that I propose using

only two strata. Most of the CVS literature

talks about many strata, which are better suited for estimating population value than

detecting the presence of material misstatement. When using CVS to estimate a population’s total value, or to come to a statis- tically valid conclusion that a population is

within an acceptable range of values, many

strata may be needed to “efficiently” achieve

the required precision. When attempting to

detect material misstatement, the preferable

approach is to use fewer strata. My objective in these practical approaches is not to

calculate population values, but rather to

assess the likelihood of material misstatement in the population. Practical Approach 1 Step 1. Stratify the population into two

strata: all items > x (the 100% layer) and

all other items. The 100% layer will

include all items greater than or equal to

the tolerable misstatement (performance

materiality) for the test, but will likely be

lower if there are little or no individual

items greater than or equal to performance materiality. In other words, x can

be no greater than performance materiality for the test and will likely be lower;

this is more or less the same as the software would do for an MUS test.

Step 2. Determine the sampling parameters as in an “attributes” test, with regard to

the number of items in the population, confidence level (the flip side of risk), and tol- erable error rate. The expected error rate

should be set at one-half of the tolerable error

rate. The attribute in this test is whether the

inventory item is properly valued. The resulting sample size will be three to four times

larger than a sample for discovery sampling.

The benefit of this larger sample size is that

if the population is error-prone, the sample

will include enough errors to provide the

basis for the calculation of a confidence interval that can serve as a reasonable basis for

an audit decision.

Step 3. Input the sampling parameters

into an “attributes” sample size calculator.

The result will be a sample size of more

items than MUS, but fewer than CVS

(which considers the variability of the

entire population).

Step 4. Perform the test on the 100% layer

and the sample items (selected randomly).

Step 5. Input the results, including the

100% layer, into a CVS evaluation program.

What happens next depends on the sample results. The CVS evaluation program

will produce several numbers that are used

in the following evaluation rules:

n Precision

n Lower confidence limit/lower error limit

(LEL) EXHIBIT 1

Results for Practical Approach 1 Sample

Number 8 Number of

Misstatements Rate of

Misstatement Precision Lower

Confidence Estimated

Misstatement Upper

Confidence Proposed

Correction 1 13 7.4% 286,692 (258,141) 28,552 315,244 — 2 17 9.7% 478,327 (257,115) 221,212 699,540 — 3 15 8.6% 315,483 (550,994) (235,510) 79,973 — 4 15 8.6% 237,220 (399,814) (162,594) 74,626 — 5 8 4.6% 252,794 (324,703) (71,908) 180,886 — 6 17 9.7% 800,424 (368,253) 432,171 1,232,594 Reject 7 21 12.0% 368,268 (493,441) (125,174) 243,094 — 8 10 5.7% 334,009 (329,553) 4,457 338,466 — 9 17 9.7% 330,763 (772,299) (441,536) (110,774) — 10 14 8.0% 346,298 (697,857) (351,558) (5,260) — APRIL 2014 / THE CPA JOURNAL Projected error

Upper confidence limit/upper error limit

(UEL).

Rule 1. If precision is greater than performance materiality (PM), the test fails—

that is, because the results are not sufficiently precise, there is no basis upon

which to propose any correction. This

leaves one with three alternatives:

n Increase the sample size and test enough

items in a single additional step to

achieve a precision that is less than PM;

this might be a lot of work.

n Ask the client to rework the population

to reduce the error rate and then retest the

population from scratch.

Neither of these two alternatives will

endear an auditor to a client, but it is not

an auditor’s fault that the population has

so many errors in it.

n Finally, see if lowering the confidence

level produces an acceptable precision.

It is with some trepidation that I even

mention this third alternative. While theoretically acceptable, this choice is too

prone, in my opinion, to rationalization.

The confidence level at which the test

was originally designed and performed

should not be second-guessed because of

n

n Many auditors may not like the idea of proposing

correction where none is needed, but in the real world

You are here: Home / PERSPECTIVES auditing Myths and Inconvenient Truths about Audit Sampling An Audit Partner’s Perspective