Chapter 3: About Memory Analysis
The
Basic Objective
Memory
analysis is the practice of monitoring the behavior of a program
during its execution to determine if there are errors in the program's
use of memory. Understanding the basic principles of memory analysis
is very useful in interpreting ZeroFault's output.
There
are two broad categories of errors that memory analysis detects:
the first is reading or writing to memory that is not available
for use by the program, and the second is reading from memory available
to the program but which has not been initialized. Either of these
types of errors can be either benign or catastrophic depending on
the circumstances of the programs execution. Errors involving the
use of allocated but uninitialized memory are more likely to be
benign than use of memory not allocated to the process. Writing
to unallocated memory is more likely to be harmful than reading
from unallocated memory. Any of these errors can have catastrophic
results and often will only manifest themselves intermittently.
These
two types of errors result in many of the more serious errors that
can plague the software development process. These errors can be
very difficult to detect and even more difficult to resolve. This
is because the symptom often seems completely unrelated to the cause.
For instance: a memory overwrite (use of more data than was actually
allocated) may often overwrite non-essential data or, when the data
is critical, it may not be accessed until much later. The resulting
behavior is that the program appears to work or does not begin to
misbehave (or crash) until much later.
In
order to detect errors ZeroFault monitors every memory allocation,
reallocation and free that a process performs and also monitors
every instruction and system call that the process executes. The
data that ZeroFault collects when a memory management function is
called (typically malloc, realloc or free)
is used to verify the legality of each instruction as it is executed.
For instance when a process allocates 10 bytes of memory and then
uses 11 or 12 bytes instead of just the 10 that were allocated ZeroFault
detects this and generates an error message indicating the instruction
and line of code where the error occurred. This is an example of
the basic function of memory analysis: the examination of the execution
of the program to detect instances where the program's actions are
not legal within the context of the current memory allocation and
initialization state.
In
order for ZeroFault or any other memory allocation tool to detect
illegal use of memory it must understand the semantics of the memory
allocation scheme in use. ZeroFault understands the semantics of
the standard C library functions malloc, realloc,
and free. Since the C++ operators new and delete
resolve to the standard C library functions they are also covered.
Fortran and Pascal programs also manage memory using the standard
C library functions on AIX so they are also covered by default.
If a program uses any form of memory allocation which eventually
resolves to the standard C library functions it will also be fully
covered.
If
a program uses non-standard memory management ZeroFault will not
be able to detect as many errors because ZeroFault will have no
way of knowing when a region of memory is or is not available.
How
Memory is Organized on AIX
There
are four basic memory regions in an AIX program: Stack, Data, BSS,
and Heap. Sometimes the Data, BSS, and Heap areas are collectively
referred to as the "data segment".
For
a normal (non-large data model) AIX program the Data segment begins
at the bottom of segment 2 (0x20000000) and ends at the beginning
of the BSS segment. The Data segment contains constants used by
the program that are not initialized to zero. For instance the string
defined by char s[] = "hello world"; would exist in the
Data segment.
The
BSS segment starts at the end of the Data segment and contains all
global variables that are initialized to zero. For instance a variable
declared static int i; would be contained in the BSS segment.
The
Data and BSS segments are fixed in size at link time and do not
grow or change during program execution. All data items in both
segments are considered readable and writable because the data is
initialized and available for program use. Each shared object module
or shared library will have its own Data and BSS segment. Shared
library data segments are stored in segment 2 (0x20000000) on aix
3.2 and in segment F (0xF0000000) on AIX 4.x.
The
Heap area begins at the end of the Data segment and grows to larger
addresses from there. The Heap area is managed by malloc,
realloc, and free, which use the brk
and sbrk system calls to adjust its size. The Heap area
is shared by all shared libraries and dynamic load modules in a
process.
The
Heap area is the primary source of memory management problems and
is the main focus of the analysis that ZeroFault performs on a process.
When a program allocates memory via malloc, it is returned
a pointer to a region of memory of the appropriate size. Just before
that region of memory there is information that malloc,
realloc, and free will use to manage the Heap.
If that region of memory gets overwritten it will often cause a
segmentation violation in a function called malloc_y, free_y,
realloc_y, or one of their child functions. If a program
does have a segmentation violation in one of those functions it
is highly probable that the problem is a memory overwrite error.
In
large data model programs the data segment begins at the beginning
of segment 3 (0x30000000) and grows upward from there. It is followed
by the BSS segment for the primary executable and then by the Heap
area shared by all load modules.
The
fourth region of memory in AIX is the stack. In AIX the stack starts
at the top of segment 2 (0x2FFFFFFC) and grows to lower memory addresses
from there. The stack pointer (register 1) points at the lowest
point on the stack that is valid for access. The stack region contains
automatic, or local, variables (for instance a variable declared
as int i; within a function). Except in certain special
circumstances a stack frame is created each time a function is called.
Information such as saved register values, parameters, and the return
address is stored in the stack frame in addition to local variables.
When a function returns to its caller the stack is popped
and the region of stack memory that was used by that function is
no longer available to the process.
Stack
memory has no known initial state. The initial value of a stack
variable is not defined unless it is explicitly initialized. ZeroFault
checks every operation on a stack variable to insure that all stack
memory is properly initialized. Examining the value of a stack variable
before it is used will generate a USTKR error (Uninitialized Stack
Read).
Loads
and Stores
A compiled
program consists of a sequence of machine instructions. These instructions
can be classified into three groups on Power and PowerPC CPUs:
- instructions
that load memory into CPU registers (read or load instructions)
- instructions
that store CPU registers into memory (write or store instructions)
- instructions
that operate on the contents of CPU registers
Monitoring
each load and store is the core of advanced memory analysis. When
a store operation is performed ZeroFault checks that the region of
memory to be written to is available to the process for writing. Further,
the region of memory that was just stored to is marked as now having
a known state. It is now available for reading if it was not already
available for reading.
When
a load instruction is performed ZeroFault checks the address that
is being read from to ensure that it is allocated to the process
and that it is available to be read from. In short it means that
it must be both allocated and initialized. Initialized means
that it must have come from a source that has a known initial state
(sbrk, Data, or BSS) or that it must have been written
to since it was made available to the process (malloc,
realloc, or stack memory).
System
Calls
On AIX
(and most operating systems) the kernel is a "black box", and while
the interfaces to that black box are well-defined, what goes on inside
is hidden from the application program. Many system calls operate
on application program memory; for instance read will read
some data from a file descriptor and write that data into a buffer
in the application program's memory. Similarly write will
read some data from an application memory buffer and write that data
to a file descriptor.
When
doing memory analysis it is necessary to validate the parameters
to each system call. For instance when the read system call is invoked
it is necessary to ensure that the buffer passed to read is available
to the application program and is large enough to contain the amount
of data that read may fill in. Further, if that region
of memory was not previously marked as initialized the bytes written
to by read must now be marked as initialized.
ZeroFault
knows the semantics of all system calls defined on AIX and validates
the parameters passed to each of them to make sure that they are
available to the process and initialized if necessary. When a system
call updates application memory ZeroFault marks that memory as initialized.
If
there are system calls that have been added to the system that ZeroFault
does not know about ZeroFault will not be able to validate the parameters
to them or mark the regions that they write to as initialized. The
result of this is that there may be errors that are missed or falsely
reported uninitialized memory read errors if there are non-standard
system calls in use.
A
Word About Signal Handlers
A Signal
handler is a function that is invoked when a signal is delivered to
a process. Signal handlers are often used to process IO (SIGIO), handle
messages from other processes (SIGUSR1, SIGUSR2), catch timeout conditions
(SIGALRM), etc. When a signal is delivered the thread that is currently
executing stops and the signal handler is invoked on that thread's
stack. The signal handler must run to completion before that thread
can continue. In a single-threaded process this means that the process
stops all execution until the signal handler completes. In a multi-threaded
process other threads can be executed while the signal handler is
running, but the thread that received the signal cannot resume execution
until the signal handler completes.
Signal
handlers can be invoked at any time. This poses a problem when a
signal handler tries to update a shared resource. For example, if
a thread receives a signal while it is in the midst of updating
a linked list, and the signal handler examines or updates that same
list, this could result in serious consequences (such as a segmentation
violation). Because of this potential problem there are a very limited
number of things that can be safely done from a signal handler.
|