Choose 3 articles to research the most popular platforms in use
today. After reviewing the articles, compare and contrast the
strengths and weaknesses of two of the platforms and identify
commonalities. Use the articles to support your arguments. You will
need to use the Gartner Magic Quadrant to find leading platforms.
Just search the web for Gartner Magic Quadrant Analytics and choose
two leaders or visionaries to begin your research. You may use the
two articles below, along with one of your own choosing, or you can
choose three sources on your own.ŢĂRANU, I. (2015). Big Data
Analytics Platforms analyze from startups to traditional database
players. Database Systems Journal, 6(1), 23-32. ***Article
attachedChan, D. Y., & Kogan, A. (2016). Data Analytics:
Introduction to Using Analytics in Auditing. Journal Of Emerging
Technologies In Accounting, 13(1), 121-140. doi:10.2308/jeta-51463.
***Article attached.Please follow APA guidelines for the paper: 2
pages, double-spaced, 12-point Times New Roman font, one-inch
margins. Remember to include a References page that includes all
references to material used in your paper. In-text citations are
also required in the body of your paper.JOURNAL OF EMERGING
TECHNOLOGIES IN ACCOUNTING
Vol. 13, No. 1
Spring 2016
pp. 121–140
American Accounting Association
DOI: 10.2308/jeta-51463
Data Analytics: Introduction to Using Analytics in Auditing
David Y. Chan
St. John’s University
Alexander Kogan
Rutgers, The State University of New Jersey, Newark
ABSTRACT: This is a hands-on introductory practical data
analytics teaching case that can be used in an auditing or
related course. Students will learn about data attributes,
data creation, structured query language (SQL), basic
statistics, and performing basic audit procedures using
analytics by utilizing the open source software R. Instructors
can use this case for an in-class discussion or an
independent out-of-class assignment. A solutions guide is available
in the Teaching Notes. Multimedia files are available for
download, see Appendix B.
Keywords: auditing; data analytics; R; teaching case study.
INTRODUCTION
W
e are in the era of Big Data and accounting and auditing
professionals with data analytical skills are in high
demand. Accounting firms today expect graduates to have an
appreciation for analytics and understand where
analytics may be used (Ernst & Young Academic Resource
Center [EYARC] Colloquium on Analytics in the
Accounting Curriculum [EY 2015]). However, many accounting
programs do not expose their students to data analytics in their
curricula. The AACSB issued Standard A7 that provides
guidance for accounting programs including learning experiences
that
develop skills and knowledge relating to data creation, data
sharing, data analytics, data mining, data reporting, and data
storing
in an organization (AACSB 2014). The purpose of this data
analytics case is to gently introduce and familiarize students with
the use of data analytics in general and its use in the
accounting and auditing contexts. Furthermore, students will learn
about
data attributes, data creation, structured query language
(SQL), basic statistics, and performing basic audit procedures
using
analytics. The case utilizes the free open source software R
(The R Foundation 2015). Instructors can use this case for an
inclassroom, instructor-led discussion or an independent
out-of-classroom student assignment. The case provides all the
necessary instructions from how to set up R to performing
common audit-related procedures. The instructions are all inclusive
and the students will not need resources outside of this case
and its related materials.
The scripts used in this case can be downloaded as a text
file, see the link to ‘‘Scripts’’ in Appendix B.
Background Information
As a new staff auditor for a public accounting firm, you have
been assigned to the XYZ Inc. audit engagement. XYZ is a
public company. The engagement partner has suggested the use
of computer-assisted audit tools and techniques (CAATTs)
whenever possible in order to ensure that an effective and
efficient audit is performed. A CAATT is computer software that
allows auditors to perform data analytics. Furthermore,
CAATTs may aid the auditor in testing 100 percent of the
transaction
population and automating or semi-automating the performance
of audit procedures. While many audit procedures can be
performed manually or with the use of Microsoft Excel, manual
methods are not always effective or efficient and the use of
spreadsheet software has data limitations. For example, Excel
2013 only allows for the 1,048,576 rows or observations and
16,384 columns or variables (Microsoft 2015). The limitation
on the number of columns may not be an issue but the limitation
We acknowledge and thank Miklos A. Vasarhelyi, Hui Du, the
reviewers, and participants of the 2015 AIS Educator Conference and
the 2015 AAA
Annual Meeting for the insightful comments and suggestions to
advance and improve our paper.
Supplemental materials can be accessed by clicking the links
in Appendix B.
Editor’s note: Accepted by Miklos A. Vasarhelyi.
Submitted: March 2015
Accepted: March 2016
Published Online: April 2016
121
122
Chan and Kogan
on the number of rows may become a constraint. Furthermore,
computational performance issues in Excel will result when
analyzing large datasets.
The audit manager on the engagement has assigned to you
specific audit procedures to be performed within the revenue
cycle of the audit program. The audit procedures assigned
include footing, re-computing, scanning, sample selection, and an
analytical procedure. There are two popular CAATTs in the
audit software market: ACL and IDEA. However, the firm prefers
the use of the data analytical software R because of its open
source nature and also the versatility of the analysis in the
software.
The use of open source software is becoming more mainstream
with many different organizations (Deloitte 2015).
Furthermore, many students are familiar with R from their
statistics courses. Interfacing or communicating with R is done
through coding or scripting, unlike in Excel where there is a
graphic user interface (GUI). Unlike CAATTs that utilize GUIs,
the benefit of a scripting interface is its inherent ability
to document, review, and reproduce the path of analysis. Scripting
allows firms and regulators to perform reviews of the audit
work. In R, users write the script and then execute the script.
While
there is initially a steeper learning curve compared with GUI
software, scripting software allows the auditor to reuse scripts in
subsequent analysis and therefore makes analysis more
efficient.
Before proceeding with any audit procedures or analysis, you
will have to install the R software on your computer. R is
compatible with Windows, Apple, and Linux computers. In this
case, we will use the Windows version for demonstration
purposes. Please refer to Appendix A for detailed
instructions on installing R.
Installing and Loading Packages
The R software comes with pre-installed add-on packages for
basic analysis. However, R has a long-standing open source
community that develops packages for more advanced data
manipulation and analysis. R has an extensive number of free
packages that are contributed by the open source community.
‘‘Free’’ does not in any way mean that the packages are inferior.
Users in the community may have created a specific feature
that was not available in the basic software and wanted to share
the
developed feature with the community. Some packages are
developed for bleeding-edge analytics. The contribution from the
open source community and the variety of available analytics
differentiates R from other analytical software. You will need to
download, install, and load two packages (‘‘sqldf’’ and
‘‘forecast’’) for the exercises in this case. You can copy and
paste the
scripts from the ‘‘Scripts’’ text document (and as shown
below) in the R Editor window to download and install the two
packages. A video that shows how scripts are executed in R is
available for download, see Appendix B.
Script 1:
After copying and pasting the code into the R Editor window,
highlight the pasted script and click and hold ‘‘Ctr’’ and then
click ‘‘R’’ on the keyboard. This will send the highlighted
code into the R Console window and execute the code. Two
‘‘Question’’ dialog boxes may pop up. You should click
‘‘Yes’’ in both dialog boxes. A new window ‘‘HTTPS CRAN mirror’’
will pop up. Select the CRAN mirror closest to your area and
click ‘‘OK.’’ For example, you can select USA (CA 1) if you are
located near California, United States. The software will
automatically begin to download and install the two packages from
the
respective selected mirror. Packages will only need to be
installed once.
After installing the two packages, the ‘‘sqldf’’ and
‘‘forecast’’ packages will need to be loaded. These packages will
have to be
loaded every time when the R software is started. To load the
two packages, enter the following two-line script below into the R
Editor window and then highlight the two lines and click and
hold ‘‘Ctr’’ and then click ‘‘R’’ on the keyboard. The scripts will
execute and the packages will be loaded into the R software.
Notice the code passes on to the R Console window and executes.
Script 2:
Setting Format Output Options
R is designed for statistical analysis and occasionally the
software recognizes some financial or nonfinancial values in the
exponential format ‘‘1eþ.’’ You will need to command R to
utilize the fixed-point format instead of the exponential format
since
it is commonly used when dealing with monetary/dollar values.
The fixed-point format will output number values in the
numeric format by default. The various data attributes will
be discussed in the ‘‘Data Cleaning’’ section. Enter the following
script into the R Editor window and then highlight the script
and click and hold ‘‘Ctrl’’ and then click ‘‘R’’ on the keyboard.
This option command will have to be executed every time when
the R software is first started.
Journal of Emerging Technologies in Accounting
Volume 13, Number 1, 2016
Data Analytics: Introduction to Using Analytics in Auditing
123
Script 3:
The R software is now set up and ready for use in this case.
Importing Data
Obtaining data is the first step in data analytics. In an
audit, the client is responsible for providing the auditors with
financial data to audit. The client has provided the
engagement team data from their sales journal, shipping journal,
cash
journal, and customer master file. The client has also
provided the aggregated revenue account balance for the last ten
years.
The files are provided to the auditors in five
comma-separated value (CSV) files. CSV is an open format and is
commonly used
to transport data and can be easily readable by analytic
software.
Use the following R Scripts below to import these five CSV
files into R. The R Scripts will pull the respective CSV files from
a hosted server (also see the links to Sales1, Ship1, Cash1,
Customer1, and Revenue in Appendix B). Once imported, R will store
the imported data in dataframes. Dataframes are similar to a
database table or an Excel sheet. Enter the following lines of
scripts
below into the R Editor window and then highlight the scripts
and click and hold ‘‘Ctrl’’ and then click ‘‘R’’ on the keyboard.
Script 4:
Script Definition
Code
read.csv
‘http://davidchan.net/data/Sales1.csv;
header ¼ TRUE
sep ¼ ‘‘,’’
Definition
Reading a comma separated value ‘‘csv’’ file
File location
Variable names are included at the top of columns
Data is separated by commas
After importing the data into R, the ‘‘View’’ command can be
used to open a dataframe. For example, to view the imported
‘‘Sales1’’ dataframe, enter the script shown below into the R
Editor window and then highlight the script and click and hold
‘‘Ctrl’’ and then click ‘‘R’’ on the keyboard:
Script 5:
Data Diagnostics
Once the CSV data files have been imported into R, you should
run some preliminary data diagnostics/checks. The purpose
of running diagnostic checks is to determine whether the
files have been imported correctly and whether the data is what you
will need for performing the procedures. You can use an array
of diagnostic techniques to check for the number of rows
(observations) and columns (variables), verify the variable
names, and view and inspect the first six rows of data or the last
six
rows of data. The diagnostic of data is important as you want
to confirm that you are analyzing the data that you are expecting
to analyze.
Enter the scripts below into the R Editor window and then
highlight the scripts and click and hold ‘‘Ctrl’’ and then click
‘‘R’’ on the keyboard to show the number of rows (nrow) or
columns (ncol) in the ‘‘Sales1’’ dataframe:
Journal of Emerging Technologies in Accounting
Volume 13, Number 1, 2016
Chan and Kogan
124
Script 6:
Checkpoint 1
(1) How many rows are in the ‘‘Sales1’’ dataframe?
(2) How many columns are in the ‘‘Sales1’’ dataframe?
Enter and execute the script below into R to show the
variable (column) names in the ‘‘Sales1’’ dataframe:
Script 7:
Checkpoint 2
(1) List the variable names in ‘‘Sales1’’ dataframe?
Enter and execute the scripts below into R to show the first
six rows (head) and last six rows (tail) of the ‘‘Sales1’’
dataframe:
Script 8:
Checkpoint 3
(1) What is the third ‘‘Sales_Order_No’’ in the ‘‘Sales1’’
dataframe?
(2) What is the last ‘‘Sales_Order_No’’ in the ‘‘Sales1’’
dataframe?
Data Cleaning
The client may provide the auditors with the data needed for
auditing but the data may not be in a format that is usable by
R for analysis. Data cleaning is necessary to get the data in
a format that R can use and make computations from. For example,
the client may provide monetary accounting data with commas
(‘‘1,500’’). Numeric data with commas are recognized as text or
as character format in R. R cannot do calculations on
variables identified as character variables and thus they will need
to be
converted into the numeric format. There are five basic types
of data formats: (1) numeric, (2) integer, (3) character, (4)
factor,
and (5) date. Below are examples of each:
Format Type
Numeric
Integer
Character
Factor
Date
Example
837223, 123.23, 2320840.98 (can have decimals)
235, 8372, 23208 (no decimals)
data, auditing, Main Street
0,1
2014-12-11
If you open ‘‘Sales1,’’ ‘‘Cash1,’’ and ‘‘Customer1’’
dataframes, then you will notice that there are not only variables
that
have numeric values, but also have commas in them. Therefore,
R may consider these numeric variables as either Character or
Factor variables. For subsequent analysis on these numeric
variables to occur in R the commas have to be removed and the
variables have to be converted into the numeric format.
First, consider the Sales Journal Dataframe (Sales1):
R has a function called ‘‘str’’ that can be used to show the
data format for each variable in a dataframe. Enter the script
below into the R Editor window and then highlight the script
and click and hold ‘‘Ctrl’’ and then click ‘‘R’’ on the keyboard:
Script 9:
Notice the ‘‘Invoice_Amount’’ variable in the ‘‘Sales1’’
dataframe is categorized as a Factor type variable (Figure 1).
A numeric variable cannot have commas between the numbers.
You will need to strip the commas from the numeric values
in the ‘‘Invoice_Amount’’ variable and convert the variable
into a numeric variable using the following code:
Journal of Emerging Technologies in Accounting
Volume 13, Number 1, 2016
Data Analytics: Introduction to Using Analytics in Auditing
125
FIGURE 1
Sales1 Dataframe
Script 10:
Script Definition
Code
Sales1$Invoice_Amount
as.numeric
gsub
‘‘,’’, ‘‘’’
Definition
Variable ‘‘Invoice_ Amount’’ in the ‘‘Sales1’’ dataframe
Convert variable to numeric data type format
Remove comma function
Replace comma with no space
Next, let’s consider the Cash Receipts Journal Dataframe:
In the Cash Receipts dataframe, the ‘‘Invoice_Amount’’ and
‘‘Payment_Received’’ variables are both categorized as a
Character type of variables but should be Numeric variables.
Again, this is due to the commas between the numbers. You will
need to strip the commas from the variables and convert the
variables into Numeric variables using the follow scripts:
Script 11:
Checkpoint 4
(1) Remove the commas and convert the variables into a
numeric variable for the following two variables ‘‘Customer_
Balance’’ and ‘‘Customer_Max_Credit_Amount’’ in the Customer
Master File dataframe ‘‘CUS1.’’
Next, determine whether the other variables in the dataframes
are categorized in the correct data type. As you recall, a
variable can be a Numeric, Integer, Character, Date, or
Factor data type. If a variable is not categorized correctly, then
you will
need to convert the variable into the correct type.
Enter and execute the scripts below in R to determine the
data type for each of the variables in each of the respective
dataframes:
Script 12:
You will notice that the dataframe ‘‘Sales1’’ has two
variables in an incorrect format; (1) ‘‘Invoice_Date,’’ and (2)
‘‘Invoice_
Description’’ (Figure 2). R is categorizing ‘‘Invoice_Date’’
and ‘‘Invoice_Description’’ as Factor variables. The
‘‘Invoice_Date’’
should be in the Date format and ‘‘Invoice_Description’’
should be in the Character format. The ‘‘Invoice_Date’’ is the
invoice date
and you will need to convert the variable ‘‘Invoice_Date’’
from the Factor format to the Date format. Similarly, you will need
to
convert the variable ‘‘Invoice_Description’’ from the Factor
format to the Character format. The ‘‘Invoice_Description’’
variable
describes the type of sale and should be characterized in the
Text or Character format. Here are the scripts to do so:
Journal of Emerging Technologies in Accounting
Volume 13, Number 1, 2016
Chan and Kogan
126
FIGURE 2
Sales1 Dataframe
Script 13:
In the Shipping dataset, you will need to convert the
variable ‘‘Shipping_Number’’ to the Character format and the
variable
‘‘Shipping_Date’’ to the Date format.
Script 14:
In the Customer dataset, you need to convert the variables
‘‘Customer_No’’ into the Integer format, ‘‘Customer_Name’’
into the Character format, ‘‘Customer_Address’’ into the
Character format, ‘‘Customer_City’’ into the Character format,
‘‘Customer_State’’ into the Character format,
‘‘Customer_Credit_Rating’’ into the Factor format, and
‘‘Customer_Max_Credit_
Amount’’ into the Numeric format.
Script 15:
Checkpoint 5
(1) In the Cash dataset, convert the variable
‘‘Payment_Date’’ to the Date format.
Finally, you should re-examine all the dataframes and
determine whether all the variables are in the correct data type
before
you continue. Enter the following scripts and verify:
Script 16:
Structured Query Language
Structured Query Language (SQL) is a standard language of
relational database management systems (DBMS). Auditors
can use the language to access, make queries, create new
tables (dataframes), and manipulate data in a database. For the
Journal of Emerging Technologies in Accounting
Volume 13, Number 1, 2016
Data Analytics: Introduction to Using Analytics in Auditing
127
purpose of this case study and its use with R, focus on the
latter three. You will need to understand how to use the SQL
SELECT Statement. This statement starts with the SELECT
keyword followed by a comma-separated list of variables that will
be displayed in the results set generated by the statement.
This is followed by the FROM clause, which lists the dataframes
required to construct the result set. Then, the statement may
have the WHERE clause that provides the conditions the result set
satisfies. For example, you want to SELECT the variables
‘‘Invoice_No’’ and ‘‘Invoice_Amount’’ FROM the ‘‘Sales1’’
dataframe WHERE ‘‘Invoice_Amount’’ is greater than 1500. This
SQL command would return back the variables ‘‘Invoice_
No’’ and ‘‘Invoice_Amount’’ where the ‘‘Invoice_Amount’’ is
greater than 1500. Further examples will be articulated below.
Create New Dataframes
Prior to analyzing data, create a new dataframe by extracting
the relevant variables from pre-existing dataframes. The
purpose of creating a new dataframe is two-fold; (1) you want
to preserve the original data, and (2) speed up subsequent
analysis since the dataframe will be smaller and only
consisting of the relevant variables.
In auditing accounting information, you need to select
variables in various data types …
Purchase answer to see full
attachment












Other samples, services and questions:
When you use PaperHelp, you save one valuable — TIME
You can spend it for more important things than paper writing.