Here's
a tough truth: If you can't get control of your test data, you can
forget test automation. The core value proposition of automation rests
on reusability, and if you can't reuse the data you can't reuse the
tests. Even manual testing requires test data, of course, but a manual
tester can try to find or create the data they need on the fly.
My
experience shows that testers spend 80% or their time or more just
trying to locate data they can use or entering the conditions they
need. For more on this, see "Automated or Not, It's All About the Data"
(http://www.stickyminds.com/s.asp?F=S9033_COL_2).That means having reusable data even for manual testing would improve productivity by orders of magnitude.
So
what's the big deal? Can't you just copy production data? Aren't there
tools out there that let you extract snapshots from databases, scramble
existing data or generate fake values?
The bad news is, these traditional techniques usually don't work. The good news is the solution may be easier than you think.
The Problem with Production
Most
companies use production data in some form for their testing. For one
thing, it's realistic - after all, it's from production. For another,
it's easy - it's data you already have. But there are problems.
The
first problem is that it is probably so much data that it's costly to
copy and store. Second, all that volume makes it hard to find the exact
conditions you need for a particular test scenario, and it may not even
exist in the form you need. Third, it's dynamic - a constantly moving
picture that is too unpredictable to yield any reuse.
And lately
there is a fourth reason: privacy. It's highly likely that some of the
data is confidential and cannot be legally disclosed to others,
including testers. There are the obvious fields like social security
numbers, but others may include personal data like addresses, account
numbers, and any transactions related to financial or health matters.
See "Keeping Secrets: How Data Privacy Affects Testing", http://www.stickyminds.com/s.asp?F=S8327_COL_2 for more detail on this issue.
And yes, there are tools that can help with these problems. But it's not that easy.
The Trick with Tools
Database
tools have been around for a long time and are widely available. Most
of them were developed to test databases by generating high volumes of
data or extracting selected subsets, but some are specifically targeted
at testing and can locate, obfuscate or create required data values.
The
trick is that it's tricky to selectively extract or create a
meaningful, coherent set of data. By meaningful I mean that it contains
the test conditions you are interested in, and by coherent I mean that
all of the related data is included. It is not as easy as taking every
Nth record or some flat percentage of the data: complex
interrelationships must be maintained between data elements.
For
example, testing customer orders might require the customer master
record, any related contracts on file, all transactions for that
customer, plus all of the warehouse locations, product inventories,
shipping codes, commission records, and myriad other data elements that
touch the customer or the transactions - or anything they touch.
Furthermore,
few applications operate on a single source of data. Many have
interfaces to other applications that take the form of even more files
in a wide array of formats. Some of these interfaces are real-time,
some are batch, but all must be coordinated. While database tools may
help you trace the relationships between tables and fields, they may
break down when external files and formats are in the mix.
And
finally there is the question of dates. Dates abound in most
applications and are often central to calculations and event triggers.
The date of an order may affect its pricing based on contractual terms.
Posting a payment in one period versus another may create a late fee or
interest charge. The dates of shipments or receipt of goods may trigger
automated inventory orders, and so on and on. Database tools may know
about table relationships but they don't know about date dependencies.
My
experience shows that despite their availability and relatively
sophisticated capabilities, few of these tools are actually
successfully deployed for testing because the effort and skills
required to make them work is too high.
The Advantage of Automation
So
what does work? Ironically, test automation presents both the challenge
and the solution. It is the challenge because, as pointed out,
automation won't work without reusable test data. It is also the
solution, though, because automation can create the data it needs.
Think
about it. If you need a customer account with particular conditions for
testing, you can either try to find it or you can create it. Finding it
may take a lot of time and it may not even exist in the form you need.
The advantage of creating it is that you know it exists and that has
the right conditions because you put it there. Another plus is that the
very act of creating the customer is, in itself, a test.
Manually
creating all that data is not practical, but with automation it makes
sense. Test automation tools can type in data quickly and accurately.
By simply planning our your test cycle carefully, you can be sure that
all the data you need is there how and when you need it...and expand your
test coverage along the way. See "The Test Automation Timetable:
Altered States" http://itmanagement.earthweb.com/entdev/article.php/622301 for more on automating data states.
Now
this doesn't mean you won't need a starting point that includes basic
master data. Believe it or not, trying to start with a blank database
is almost impossible: there are far too many pointers, stored
procedures, and other arcane contents that you could not understand,
let alone create, in your lifetime. So you will still find yourself
starting with some production or sandbox data just as a backdrop, but
you won't use most of it.
Instead, you will use the data you
design. And that's another benefit of automated data: You can design
exactly the conditions you need, and because you are in control you can
be sure that dates have the right relationship to each other and to the
system date.
You will still need a strategy for interfaces that
will probably include maintaining copies of files and massaging the
values, but since you are in control of the core database contents it
will be easier to know what data you need.
If this sounds too
simplistic, I can tell you that some of the largest companies in the
world, with massive, complex IT landscapes that span the globe, have
successfully automated both their testing and their data using this
approach.
Posted
11 Feb 2009 3:39 PM
by
Linda Hayes