Chapter 10: Troubleshooting
MAL_FAIL or "Unable to malloc" message
This indicates that the program is running out of available heap
memory. Since ZeroFault uses memory for its own purposes, a program
may run out of memory under ZeroFault even though it runs fine without
it. The first thing to check is that your ulimit for data is set high
enough. You can check the ulimit by running:
$ ulimit -d
(Note that ulimit displays and accepts values in kilobytes.)
The default data ulimit is often set to 128 megs, which may be too
small. As an example, you can increase it to 250 megs by running:
$ ulimit -d 256000
If increasing your ulimit doesn't solve the problem, you can build
or patch the target program to use a large address space.
Normal AIX executables only use one segment, or 256 megabytes, of
address space for the data segment. But using the large address
space feature, you can make your program use up to 8 segments, or
2 gigabytes, for data.
You can build your program with the large address space by specifying
the -bmaxdata flag to the compiler/linker.
For instance, the executable created by the following command will use
up to 4 segments for its data:
$ cc -o foo foo.o -bmaxdata:0x40000000
You can also patch an already linked executable to use the large
address space by using a simple command.
This chapter of the AIX General Programming Concepts guide gives
more details about the large address space model and how to patch
executables to use it. If you have the documentation engine for AIX, search for
"Large Program Support."
Other possible causes of running out of memory are:
-
Running out of paging space. Use the lsps -a command to check paging
space while the process is running.
-
Corruption of the heap data structures, caused by program errors.
Resolving the errors indicated by ZeroFault, especially the Bad Memory
Writes, should fix this.
A program that runs fine without
ZeroFault fails when run under ZeroFault
ZeroFault is designed not to affect a program's behavior, so your
program should act the same as when it is run from the command line
without ZeroFault. However, programs that are timing-dependent may
run differently under ZeroFault, since it does make the program run
slower. Given that normal program execution speed can be affected by
a number of factors (such as system load), it is generally not a good
idea to have a program be this timing-dependent in any case.
Programs that have memory errors may act differently when run
under ZeroFault, since ZeroFault changes the location of allocated
memory, and the contents of uninitialized memory may be different.
For this reason, programs that have errors may exhibit different
failure characteristics when run under ZeroFault, or they may even
fail under ZeroFault and not fail when run normally. Resolving all
the errors reported by ZeroFault should eliminate these problems.
A setuid program doesn't run under ZeroFault
ZeroFault will run setuid/setgid programs, but it must run them under
the uid/gid of the user invoking ZeroFault. This is due to a security
restriction in the AIX facility that ZeroFault uses to launch the
program. If you need the program to run under the uid/gid that it is
set to, use the su command to set your uid/gid to that of
the program before you run ZeroFault.