
Image via Wikipedia
Over the weekend I’ve been trying to track down a memory leakage in a piece of our software. I learned a lot about state-of-the-art in python debugging, so I am sharing it here.
So, the problem: Approximately every two months one of our daemon processes goes bezerk, allocating memory left and right. Usually the machine has to be rebooted or kernel’s out-of-memory-killer does his job. This time Jure (our beloved admin) caught it. Now we had a 30GB core dump file.
The question becomes: so what’s allocating all that memory? Or more precisely how do we get something out of that core dump.
A few more details: Zemanta on production servers uses Python 2.5, a stock Debian Lenny version. We don’t use python-dbg builds to run the production.
After opening the dump with gdb the first thing I saw was that there are no debugging symbols. By installing python-gdb this can be solved – gdb knows how to load symbols from a separate libraries /usr/lib/debug even when loading core dumps that were generated without symbols. Naturally this works only if the distribution gives you the exact right symbols for the binary at hand.
When symbols are loaded after the fact, this is what you should see when loading a core dump into gdb:
Reading symbols from /lib/libc.so.6…Reading symbols from /usr/lib/debug/lib/libc-2.7.so…done.
Ok, so now we have symbols, let’s look at the backtrace first (this is thread #2, luckily we only have two threads in this daemon):
#0 update_refs (generation=2) at ../Modules/gcmodule.c:242
#1 collect (generation=2) at ../Modules/gcmodule.c:789
#2 0x00000000004be35b in collect_generations (basicsize=) at ../Modules/gcmodule.c:897
#3 _PyObject_GC_Malloc (basicsize=) at ../Modules/gcmodule.c:1336
#4 0x000000000045f569 in PyType_GenericAlloc (type=0x728a20, nitems=0) at ../Objects/typeobject.c:454
#5 0x0000000000426ddc in BaseException_new (type=0×2, args=, kwds=0x7f1505978010) at ../Objects/exceptions.c:35
#6 0x000000000046ddf3 in type_call (type=0×2, args=0x7f18d379d050, kwds=0×0) at ../Objects/typeobject.c:422
#7 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#8 0x000000000048b262 in PyEval_CallObjectWithKeywords (func=0x728a20, arg=0x7f18d379d050, kw=0×0) at ../Python/ceval.c:3444
#9 0x00000000004a011e in PyErr_NormalizeException (exc=0x7fffffff93d8, val=0x7fffffff93d0, tb=0x7fffffff93c8) at ../Python/errors.c:175
#10 0×0000000000491160 in do_raise (f=0x2b58190, throwflag=) at ../Python/ceval.c:3077
#11 PyEval_EvalFrameEx (f=0x2b58190, throwflag=) at ../Python/ceval.c:1632
#12 0x000000000049313d in PyEval_EvalCodeEx (co=0x1f46468, globals=, locals=, args=0×2939698, argcount=2, kws=0×0, kwcount=0, defs=0×0, defcount=0, closure=0×0) at ../Python/ceval.c:2838
#13 0x00000000004dd612 in function_call (func=0x1e8ec08, arg=0×2939680, kw=0×0) at ../Objects/funcobject.c:517
#14 0x000000000041e0eb in PyObject_Call (callable=0x1e8ec08) at ../Objects/abstract.c:1861
#15 PyObject_CallFunctionObjArgs (callable=0x1e8ec08) at ../Objects/abstract.c:2091
#16 0x000000000048920d in builtin_getattr (self=, args=) at ../Python/bltinmodule.c:737
#17 0x00000000004919c2 in call_function (f=0x2b09dd0, throwflag=) at ../Python/ceval.c:3575
#18 PyEval_EvalFrameEx (f=0x2b09dd0, throwflag=) at ../Python/ceval.c:2274
#19 0x000000000049313d in PyEval_EvalCodeEx (co=0xa34cd8, globals=, locals=, args=0×3, argcount=1, kws=0x7f18c809bf98, kwcount=0, defs=0x9c28d8, defcount=2, closure=0×0) at ../Python/ceval.c:2838
#20 0×0000000000491663 in fast_function (f=0x7f18c809bdd0, throwflag=) at ../Python/ceval.c:3671
#21 call_function (f=0x7f18c809bdd0, throwflag=) at ../Python/ceval.c:3596
#22 PyEval_EvalFrameEx (f=0x7f18c809bdd0, throwflag=) at ../Python/ceval.c:2274
#23 0x000000000049313d in PyEval_EvalCodeEx (co=0x1f4d210, globals=, locals=, args=0×3, argcount=2, kws=0x7f18c809bd90, kwcount=0, defs=0×2038228, defcount=1, closure=0×0) at ../Python/ceval.c:2838
#24 0×0000000000491663 in fast_function (f=0x7f18c809bbf0, throwflag=) at ../Python/ceval.c:3671
#25 call_function (f=0x7f18c809bbf0, throwflag=) at ../Python/ceval.c:3596
#26 PyEval_EvalFrameEx (f=0x7f18c809bbf0, throwflag=) at ../Python/ceval.c:2274
#27 0x0000000000492a52 in fast_function (f=0x29a9670, throwflag=) at ../Python/ceval.c:3661
#28 call_function (f=0x29a9670, throwflag=) at ../Python/ceval.c:3596
#29 PyEval_EvalFrameEx (f=0x29a9670, throwflag=) at ../Python/ceval.c:2274
#30 0x0000000000492a52 in fast_function (f=0x7f18c809b9e0, throwflag=) at ../Python/ceval.c:3661
#31 call_function (f=0x7f18c809b9e0, throwflag=) at ../Python/ceval.c:3596
#32 PyEval_EvalFrameEx (f=0x7f18c809b9e0, throwflag=) at ../Python/ceval.c:2274
#33 0x0000000000492a52 in fast_function (f=0x2ebbce0, throwflag=) at ../Python/ceval.c:3661
#34 call_function (f=0x2ebbce0, throwflag=) at ../Python/ceval.c:3596
#35 PyEval_EvalFrameEx (f=0x2ebbce0, throwflag=) at ../Python/ceval.c:2274
#36 0x0000000000492a52 in fast_function (f=0x2e90db0, throwflag=) at ../Python/ceval.c:3661
#37 call_function (f=0x2e90db0, throwflag=) at ../Python/ceval.c:3596
#38 PyEval_EvalFrameEx (f=0x2e90db0, throwflag=) at ../Python/ceval.c:2274
#39 0x0000000000492a52 in fast_function (f=0x3ca8090, throwflag=) at ../Python/ceval.c:3661
#40 call_function (f=0x3ca8090, throwflag=) at ../Python/ceval.c:3596
#41 PyEval_EvalFrameEx (f=0x3ca8090, throwflag=) at ../Python/ceval.c:2274
#42 0x0000000000492a52 in fast_function (f=0×4217550, throwflag=) at ../Python/ceval.c:3661
#43 call_function (f=0×4217550, throwflag=) at ../Python/ceval.c:3596
#44 PyEval_EvalFrameEx (f=0×4217550, throwflag=) at ../Python/ceval.c:2274
#45 0x000000000049313d in PyEval_EvalCodeEx (co=0x1dc7288, globals=, locals=, args=0x2f04f50, argcount=2, kws=0×0, kwcount=0, defs=0×0, defcount=0, closure=0×0) at ../Python/ceval.c:2838
#46 0x00000000004dd612 in function_call (func=0x1dd48c0, arg=0x2f04f38, kw=0×0) at ../Objects/funcobject.c:517
#47 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#48 0x000000000041f658 in instancemethod_call (func=0x1dd48c0, arg=0x2f04f38, kw=0×0) at ../Objects/classobject.c:2519
#49 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#50 0x000000000048fef2 in do_call (f=0x2df6a80, throwflag=) at ../Python/ceval.c:3786
#51 call_function (f=0x2df6a80, throwflag=) at ../Python/ceval.c:3598
#52 PyEval_EvalFrameEx (f=0x2df6a80, throwflag=) at ../Python/ceval.c:2274
#53 0x0000000000492a52 in fast_function (f=0x257d8c0, throwflag=) at ../Python/ceval.c:3661
#54 call_function (f=0x257d8c0, throwflag=) at ../Python/ceval.c:3596
#55 PyEval_EvalFrameEx (f=0x257d8c0, throwflag=) at ../Python/ceval.c:2274
#56 0x000000000049313d in PyEval_EvalCodeEx (co=0x1dbfd50, globals=, locals=, args=0×6, argcount=1, kws=0x28fe240,
kwcount=0, defs=0x1e29a28, defcount=5, closure=0×0) at ../Python/ceval.c:2838
#57 0×0000000000491663 in fast_function (f=0x28fe090, throwflag=) at ../Python/ceval.c:3671
#58 call_function (f=0x28fe090, throwflag=) at ../Python/ceval.c:3596
#59 PyEval_EvalFrameEx (f=0x28fe090, throwflag=) at ../Python/ceval.c:2274
#60 0x000000000049313d in PyEval_EvalCodeEx (co=0x1f32990, globals=, locals=, args=0×3, argcount=1, kws=0×2578738,
kwcount=2, defs=0x173cec0, defcount=2, closure=0×0) at ../Python/ceval.c:2838
#61 0×0000000000491663 in fast_function (f=0×2578580, throwflag=) at ../Python/ceval.c:3671
#62 call_function (f=0×2578580, throwflag=) at ../Python/ceval.c:3596
#63 PyEval_EvalFrameEx (f=0×2578580, throwflag=) at ../Python/ceval.c:2274
#64 0x000000000049313d in PyEval_EvalCodeEx (co=0x7f18d2870198, globals=, locals=, args=0x187b0f8, argcount=2,
kws=0x1e54310, kwcount=0, defs=0×0, defcount=0, closure=0×0) at ../Python/ceval.c:2838
#65 0x00000000004dd709 in function_call (func=0x273e848, arg=0x187b0e0, kw=0x2577ea0) at ../Objects/funcobject.c:517
#66 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#67 0×0000000000490297 in ext_do_call (f=0x25783b0, throwflag=) at ../Python/ceval.c:3855
#68 PyEval_EvalFrameEx (f=0x25783b0, throwflag=) at ../Python/ceval.c:2314
#69 0x0000000000492a52 in fast_function (f=0x24d8df0, throwflag=) at ../Python/ceval.c:3661
#70 call_function (f=0x24d8df0, throwflag=) at ../Python/ceval.c:3596
#71 PyEval_EvalFrameEx (f=0x24d8df0, throwflag=) at ../Python/ceval.c:2274
#72 0x0000000000492a52 in fast_function (f=0×2577160, throwflag=) at ../Python/ceval.c:3661
#73 call_function (f=0×2577160, throwflag=) at ../Python/ceval.c:3596
—Type to continue, or q to quit—
#74 PyEval_EvalFrameEx (f=0×2577160, throwflag=) at ../Python/ceval.c:2274
#75 0x000000000049313d in PyEval_EvalCodeEx (co=0x26c8a80, globals=, locals=, args=0x187af50, argcount=2, kws=0×0,
kwcount=0, defs=0×0, defcount=0, closure=0×0) at ../Python/ceval.c:2838
#76 0x00000000004dd612 in function_call (func=0x273d230, arg=0x187af38, kw=0×0) at ../Objects/funcobject.c:517
#77 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#78 0x000000000041f658 in instancemethod_call (func=0x273d230, arg=0x187af38, kw=0×0) at ../Objects/classobject.c:2519
#79 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#80 0x0000000000465ca6 in slot_tp_init (self=0×2740850, args=0×2740350, kwds=0×0) at ../Objects/typeobject.c:4943
#81 0x000000000046dfdb in type_call (type=0x24d93d0, args=0×2740350, kwds=0×0) at ../Objects/typeobject.c:436
#82 0x00000000004187b3 in PyObject_Call (func=0×2, arg=0×740060, kw=0x7f1505978010) at ../Objects/abstract.c:1861
#83 0x000000000048fef2 in do_call (f=0x2576dc0, throwflag=) at ../Python/ceval.c:3786
#84 call_function (f=0x2576dc0, throwflag=) at ../Python/ceval.c:3598
#85 PyEval_EvalFrameEx (f=0x2576dc0, throwflag=) at ../Python/ceval.c:2274
#86 0x0000000000492a52 in fast_function (f=0x7da510, throwflag=) at ../Python/ceval.c:3661
#87 call_function (f=0x7da510, throwflag=) at ../Python/ceval.c:3596
#88 PyEval_EvalFrameEx (f=0x7da510, throwflag=) at ../Python/ceval.c:2274
#89 0x000000000049313d in PyEval_EvalCodeEx (co=0x7f18d2870648, globals=, locals=, args=0×0, argcount=0, kws=0×0,
kwcount=0, defs=0×0, defcount=0, closure=0×0) at ../Python/ceval.c:2838
#90 0×0000000000493332 in PyEval_EvalCode (co=0×2, globals=0×740060, locals=0x7f1505978010) at ../Python/ceval.c:494
#91 0x00000000004b2cd8 in run_mod (fp=0x75e010, filename=0x7fffffffd522 “transfer.py”, start=, globals=0x7814f0, locals=0x7814f0, closeit=1,
flags=0x7fffffffc5c0) at ../Python/pythonrun.c:1273
#92 PyRun_FileExFlags (fp=0x75e010, filename=0x7fffffffd522 “transfer.py”, start=, globals=0x7814f0, locals=0x7814f0, closeit=1,
flags=0x7fffffffc5c0) at ../Python/pythonrun.c:1259
#93 0x00000000004b2f7b in PyRun_SimpleFileExFlags (fp=0x75e010, filename=0x7fffffffd522 “transfer.py”, closeit=1, flags=0x7fffffffc5c0) at ../Python/pythonrun.c:879
#94 0×0000000000414542 in Py_Main (argc=2, argv=) at ../Modules/main.c:532
#95 0x00007f18d29fa1a6 in __libc_start_main () from /lib/libc.so.6
#96 0×0000000000413989 in _start ()
Ok… this is ugly, there must be a better way?
libpython – Easier Python Debugging
And there is – David Malcolm from RedHat has proposed and implemented a plug-in for gdb that enables you to easier debugging of python with gdb.
Now this plug-in needs gdb 7.2 and Lenny has 6.8 which means a new gdb had to be recompiled. Latest FSF gdb 7.2 will do the job. Just be careful to use the —prefix=/usr when running ./configure as the gdb somehow loses the ability to load symbols from debug libraries separately if you use your own prefix (the prefix hint and many others were curtesy of very very kind people on #gdb on Freenode, especially Jan Kratochvil).
Googling for debugging python under gdb is made much harder since gdb has lately chosen python to be its scripting language of choice. Everything gets rather

- Image by .Bala via Flickr
confusing. The next question is where to get the famous libpython.py … you can choose the one coming with Python 2.7, with Python trunk or with Cython.
Then you finally type in the magic words inside gdb shell:
(gdb) python import libpython
(gdb) py-bt
#4 (unable to read python frame information)
#8 (unable to read python frame information)
#13 (unable to read python frame information)
#16 (unable to read python frame information)
#19 (unable to read python frame information)
Great, then you try just “bt”:
for addr_incr, line_incr in zip(co_lnotab[::2], co_lnotab[1::2]):
TypeError: ‘FakeRepr’ object is unsubscriptable
Ok, no luck here. With lots of testing I found out that basically you need Python 2.6 to make it work. I’ve written to David Malcolm to se if there are any easy fixes there, but I am not holding my breath.
gdbheap – python-friendly memory analyzer
Next came gdbheap, another Fedora-affiliated project, it promises to show you where your python memory is going – even from core dumps! The web page looks promising and I’ve gotten the new gdb python extensions to work already, so surely this will work:
(gdb) python import gdbheap
(gdb) heap
Missing debuginfo for glibc
Suggested fix:
debuginfo-install glibc
It seems the instructions are for Fedora, but supposedly Debian’s -dbg packages do the same trick. Digging deeper showed that gdbheap needs a pretty wide open look into glibc and especially into its memory allocation structures. Symbols for top-level structures are present in /usr/lib/debug/lib/libc-2.7.so, one of the important ones are “mp_” and “main_arena”. However structure of mp_ itself isn’t available in that file, so when gdbheap tries to resolve things like mp_.sbrk_base it fails. The symbols needed are present in a separate library /usr/lib/debug/libc-2.7.so, but that library isn’t exactly the same as ordinary libc, so symbols from the former cannot be used for the latter. The bottom line is, I can’t use gdbheap to take a look into my 30Gb core dump, due to libc6-dbg package having just some of the symbols needed.
So there was at least some good news here – if gdbheap could give good answers, I’ll start running our problematic daemon with LD_LIBRARY_PATH=/usr/lib/debug and next time we get a dump, we’ll be able to take a look.
But just in case, I wanted to see how this gdbheap works. I’ve attached the gdb to a process running the same daemon in normal mode. It ate about 100mb of memory. Now I typed in the commands again and the progress bar started showing some kind of analysis being done. After two minutes, indeed it explained which types of python objects are eating the memory. Now there was a small issue though… I checked the memory usage of gdb itself doing this analysis, and for 100mb of resident memory it has eaten… 4Gb of memory. Scaling this to 30GB core dump that I have to analyze, I think might be feasible by the end of decade. ]
Anyway, I am still optimistic. gdbheap is written in python and at the first sight it should be possible to make it just sample the memory space instead of analyzing it in total. But until I get it working over my dump, that’s just a theory.
.gdbinit for gdb making Python bearable
Now there was another tool I’ve seen previously – .gdbinit. An init script for gdb that can print the stack trace in python-friendly way, with locals and globals. Now the problem is that for the variables part it relies into python binary actually being run, so I knew I can only have a stack traces.
The issue is that it is very, very python version dependent and naturally it didn’t work out of the box. For example one of the conditions for deciding if the function is the one from which the python-stack can be traced is:
if $pc > PyEval_EvalFrameEx && $pc < PyEval_EvalCodeEx
This basically compares program counter ($pc) with addresses of two specific functions. What it tries to do is to find out if $pc is inside PyEval_EvalFrameEx in a very clumsy way. This depends on the sequence of functions in python source file and on compiler preserving it when generating code. The same goes for termination condition which gets into infinite loops when using threads, etc.
For my specific case, I’ve found that I can do this:
if $pc > PyEval_EvalCodeEx && $pc < PyEval_EvalCodeEx+40596
and it works. The memory offset is purely hand-selected, I have no idea how large the function really is. Actually the right way to fix the problem would be to check if the “co” variable is available in local context, but nobody on #gdb would knew how to check for that programmatically.
Anyway at the end I’ve managed to piece together the .gdbinit file that gets me the following:
(gdb) pystack
/home/neith/lib/python/feedparser.py (237): __getattr__
/usr/lib/python2.5/copy.py (171): deepcopy
/home/neith/lib/python/feedparser.py (683): pop
/home/neith/lib/python/sgmllib.py (96): feed
/home/neith/lib/python/feedparser.py (2675): parse
/home/neith/zdev/neith/aggregator/finish.py (205): parse_sourcefeed
transfer.py (661): mp_process_feed
/usr/lib/python2.5/site-packages/multiprocessing/forking.py (99): __init__
transfer.py (692):
So now I have a culprit for the memory hunger — it’s sgmllib/feedparser. Unfortunately that’s third party code that is hard to understand and hard to debug. Last time we’ve tried to migrate to newer version of feedparser things fell apart. Maybe we have to do that again. Or try swapping the FeedParser for (hopefully) more mature Jakarta Feedparser or Rome, probably as a RPC service written in Java. Any suggestions?
Debugging python with gdb, final verdict
People are working on the projects that might lead us somewhere. But right now, for a developer needing to figure where the memory went inside a core dump, it doesn’t look very good. And integrated experience of debugging found in Microsoft’s tools is still far far away. But people are trying and that’s something.
Oh, I must also mention a great piece of software I’ve used once to trace a memory hunger of slightly higher reproducability – heappy. However it seems the project is dead.
Your experience / advice / hints
I’d be happy to get more information on methods to debug python code. If you know about something I have missed, please leave a comment, it would be appreciated!