Contents

Legal Notices

Chapter 1:
  Quick Start


Chapter 2:
  Introduction and Installation


Chapter 3:
  About Memory Analysis


Chapter 4:
  Finding Memory Leaks


Chapter 5:
  Finding Memory Errors


Chapter 6:
  Startup Options


Chapter 7:
  Viewing Error Messages


Chapter 8:
  Viewing Source Code


Chapter 9:
  Tips and Techniques


Chapter 10:
  Troubleshooting


Chapter 11:
  Obtaining Support


Chapter 3:  About Memory Analysis

The Basic Objective

Memory analysis is the practice of monitoring the behavior of a program during its execution to determine if there are errors in the program's use of memory. Understanding the basic principles of memory analysis is very useful in interpreting ZeroFault's output.

There are two broad categories of errors that memory analysis detects: the first is reading or writing to memory that is not available for use by the program, and the second is reading from memory available to the program but which has not been initialized. Either of these types of errors can be either benign or catastrophic depending on the circumstances of the programs execution. Errors involving the use of allocated but uninitialized memory are more likely to be benign than use of memory not allocated to the process. Writing to unallocated memory is more likely to be harmful than reading from unallocated memory. Any of these errors can have catastrophic results and often will only manifest themselves intermittently.

These two types of errors result in many of the more serious errors that can plague the software development process. These errors can be very difficult to detect and even more difficult to resolve. This is because the symptom often seems completely unrelated to the cause. For instance: a memory overwrite (use of more data than was actually allocated) may often overwrite non-essential data or, when the data is critical, it may not be accessed until much later. The resulting behavior is that the program appears to work or does not begin to misbehave (or crash) until much later.

In order to detect errors ZeroFault monitors every memory allocation, reallocation and free that a process performs and also monitors every instruction and system call that the process executes. The data that ZeroFault collects when a memory management function is called (typically malloc, realloc or free) is used to verify the legality of each instruction as it is executed. For instance when a process allocates 10 bytes of memory and then uses 11 or 12 bytes instead of just the 10 that were allocated ZeroFault detects this and generates an error message indicating the instruction and line of code where the error occurred. This is an example of the basic function of memory analysis: the examination of the execution of the program to detect instances where the program's actions are not legal within the context of the current memory allocation and initialization state.

In order for ZeroFault or any other memory allocation tool to detect illegal use of memory it must understand the semantics of the memory allocation scheme in use. ZeroFault understands the semantics of the standard C library functions malloc, realloc, and free. Since the C++ operators new and delete resolve to the standard C library functions they are also covered. Fortran and Pascal programs also manage memory using the standard C library functions on AIX so they are also covered by default. If a program uses any form of memory allocation which eventually resolves to the standard C library functions it will also be fully covered.

If a program uses non-standard memory management ZeroFault will not be able to detect as many errors because ZeroFault will have no way of knowing when a region of memory is or is not available.

How Memory is Organized on AIX

There are four basic memory regions in an AIX program: Stack, Data, BSS, and Heap. Sometimes the Data, BSS, and Heap areas are collectively referred to as the "data segment".

For a normal (non-large data model) AIX program the Data segment begins at the bottom of segment 2 (0x20000000) and ends at the beginning of the BSS segment. The Data segment contains constants used by the program that are not initialized to zero. For instance the string defined by char s[] = "hello world"; would exist in the Data segment.

The BSS segment starts at the end of the Data segment and contains all global variables that are initialized to zero. For instance a variable declared static int i; would be contained in the BSS segment.

The Data and BSS segments are fixed in size at link time and do not grow or change during program execution. All data items in both segments are considered readable and writable because the data is initialized and available for program use. Each shared object module or shared library will have its own Data and BSS segment. Shared library data segments are stored in segment 2 (0x20000000) on aix 3.2 and in segment F (0xF0000000) on AIX 4.x.

The Heap area begins at the end of the Data segment and grows to larger addresses from there. The Heap area is managed by malloc, realloc, and free, which use the brk and sbrk system calls to adjust its size. The Heap area is shared by all shared libraries and dynamic load modules in a process.

The Heap area is the primary source of memory management problems and is the main focus of the analysis that ZeroFault performs on a process. When a program allocates memory via malloc, it is returned a pointer to a region of memory of the appropriate size. Just before that region of memory there is information that malloc, realloc, and free will use to manage the Heap. If that region of memory gets overwritten it will often cause a segmentation violation in a function called malloc_y, free_y, realloc_y, or one of their child functions. If a program does have a segmentation violation in one of those functions it is highly probable that the problem is a memory overwrite error.

In large data model programs the data segment begins at the beginning of segment 3 (0x30000000) and grows upward from there. It is followed by the BSS segment for the primary executable and then by the Heap area shared by all load modules.

The fourth region of memory in AIX is the stack. In AIX the stack starts at the top of segment 2 (0x2FFFFFFC) and grows to lower memory addresses from there. The stack pointer (register 1) points at the lowest point on the stack that is valid for access. The stack region contains automatic, or local, variables (for instance a variable declared as int i; within a function). Except in certain special circumstances a stack frame is created each time a function is called. Information such as saved register values, parameters, and the return address is stored in the stack frame in addition to local variables. When a function returns to its caller the stack is popped and the region of stack memory that was used by that function is no longer available to the process.

Stack memory has no known initial state. The initial value of a stack variable is not defined unless it is explicitly initialized. ZeroFault checks every operation on a stack variable to insure that all stack memory is properly initialized. Examining the value of a stack variable before it is used will generate a USTKR error (Uninitialized Stack Read).

Loads and Stores

A compiled program consists of a sequence of machine instructions. These instructions can be classified into three groups on Power and PowerPC CPUs:
  • instructions that load memory into CPU registers (read or load instructions)
  • instructions that store CPU registers into memory (write or store instructions)
  • instructions that operate on the contents of CPU registers
Monitoring each load and store is the core of advanced memory analysis. When a store operation is performed ZeroFault checks that the region of memory to be written to is available to the process for writing. Further, the region of memory that was just stored to is marked as now having a known state. It is now available for reading if it was not already available for reading.

When a load instruction is performed ZeroFault checks the address that is being read from to ensure that it is allocated to the process and that it is available to be read from. In short it means that it must be both allocated and initialized.  Initialized means that it must have come from a source that has a known initial state (sbrk, Data, or BSS) or that it must have been written to since it was made available to the process (malloc, realloc, or stack memory).

System Calls

On AIX (and most operating systems) the kernel is a "black box", and while the interfaces to that black box are well-defined, what goes on inside is hidden from the application program. Many system calls operate on application program memory; for instance read will read some data from a file descriptor and write that data into a buffer in the application program's memory. Similarly write will read some data from an application memory buffer and write that data to a file descriptor.

When doing memory analysis it is necessary to validate the parameters to each system call. For instance when the read system call is invoked it is necessary to ensure that the buffer passed to read is available to the application program and is large enough to contain the amount of data that read may fill in. Further, if that region of memory was not previously marked as initialized the bytes written to by read must now be marked as initialized.

ZeroFault knows the semantics of all system calls defined on AIX and validates the parameters passed to each of them to make sure that they are available to the process and initialized if necessary. When a system call updates application memory ZeroFault marks that memory as initialized.

If there are system calls that have been added to the system that ZeroFault does not know about ZeroFault will not be able to validate the parameters to them or mark the regions that they write to as initialized. The result of this is that there may be errors that are missed or falsely reported uninitialized memory read errors if there are non-standard system calls in use.

A Word About Signal Handlers

A Signal handler is a function that is invoked when a signal is delivered to a process. Signal handlers are often used to process IO (SIGIO), handle messages from other processes (SIGUSR1, SIGUSR2), catch timeout conditions (SIGALRM), etc. When a signal is delivered the thread that is currently executing stops and the signal handler is invoked on that thread's stack. The signal handler must run to completion before that thread can continue. In a single-threaded process this means that the process stops all execution until the signal handler completes. In a multi-threaded process other threads can be executed while the signal handler is running, but the thread that received the signal cannot resume execution until the signal handler completes.

Signal handlers can be invoked at any time. This poses a problem when a signal handler tries to update a shared resource. For example, if a thread receives a signal while it is in the midst of updating a linked list, and the signal handler examines or updates that same list, this could result in serious consequences (such as a segmentation violation). Because of this potential problem there are a very limited number of things that can be safely done from a signal handler.

© Copyright 2013 The ZeroFault Group, LLC. All rights reserved. All logos and trademarks are property of their respective owners.