Discussion:
tips-and-tricks figuring out a crashing application?
(too old to reply)
Jens Staal
2014-12-17 17:00:19 UTC
Permalink
Background: I have ported busybox-w32 (Win32 port of busybox, normally built
with MinGW) so that it can be (cross-) compiled with Open Watcom (v2). The
resulting binary is about 20-30 % smaller (in the upper range of 400 KiB
whereas the MinGW built one is in the upper range of 600 KiB).

Now I hit something where I can not simply guess based on compiler errors so
I am out on deeper water again ;) (I am a hobbyist with no actual knowledge)

All busybox applets seem to work fine (as far as I have tested them) except
the shell. When the shell is executed I always get a stack dump. Below is an
example:

Z:\home\jens\Devel\github\bbuild>busybox sh
The instruction at 0x00402f31 referenced memory at 0x00000044.
The memory could not be written.
Exception fielded by 0x00456130
EAX=0x00000044 EBX=0x00000004 ECX=0x00476178 EDX=0x00461bfd
ESI=0x00461bfd EDI=0x00000004 EBP=0x00000001 ESP=0x0033fa4c
EIP=0x00402f31 EFL=0x00010206 CS =0x00000023 SS =0x0000002b
DS =0x0000002b ES =0x0000002b FS =0x00000063 GS =0x0000006b
Stack dump (SS:ESP)
0x00000000 0x00342a40 0x0033fd08 0x00000001 0x003412d4 0x00000009
0x00402f74 0x00000000 0x0040c093 0x00000001 0x0033fb4c 0x0033faf8
0x7b84df35 0x0033fc7c 0xf75df8a6 0x0033fab0 0x7b84df35 0xf777c660
0x00000000 0x0033fb4c 0x00000001 0x0033fccc 0x00000001 0x00000000
0x00000000 0xf777c660 0x00000000 0x00000000 0x0033fccc 0x00000000
0x00000000 0x00000001 0x00000000 0x0033fccc 0x0033fb4c 0x00000000
0x2ee3d500 0x00000002 0x7b84dc26 0x0033fb10 0x7ec87000 0x0033fccc
0x00000001 0x0033fba8 0x7ebf8fda 0xf742fca9 0x7bc8e688 0x7ebbd629
0x7ebf8fda 0x00000000 0x00000000 0x0033fb4c 0x00000001 0x0033fccc
0x00000001 0x00000000 0x00000000 0x7ffdf000 0x003400bc 0x0033fb4c
0x00000001 0xfff01000 0x00000400 0x0033fb48 0x00430042 0x02020202
0x536e6957 0x206b636f 0x00302e32 0xf742fca9 0x7ec87000 0x003412d0


I have tried a lot of different things

* A debug build adding -g1+ to the owcc CFLAGS (g2 or higher failed to
compile, also -DDEBUG activating some debug stuff in the code failed to
compile) and "debug watcom all" in the wlink script. This resulted in a huge
binary that I also could run in wdw under Wine. It highlighted an area in
the code, but disabling this part did not do a difference - which makes me
believe that the error might occur "upstream"
- what are your general trick here? Adding printf statements around
suspected areas?

* Since it seemed to be some sort of a memory allocation thing, I tested a
few things around this too:
** increasing stack size to some ridiculus number by adding "option
stack=512k" in the wlink script

** passing -Wc,-sg to owcc (also tested with -Wc,-s if it was just due to
some sort of erroneous error)

** switching from "register" to "stack" based calling convention

** based on what I read, the -Wc,-zu seemed interesting. Interestingly, this
one seems to be incompatible with the stack based calling convention
(perhaps obvious for everybody else). I also noticed that variables of type
"char **" will change type to "char * __far*" (which confuses functions if
they are used as arguments) .

So ... probably totally noob questions for most of you guys, but if you have
some pointers or ideas on where to begin that might help :)
Uwe Schmelich
2014-12-17 19:49:59 UTC
Permalink
Jens Staal wrote:

[-snip-]
Post by Jens Staal
* Since it seemed to be some sort of a memory allocation thing, I tested a
** increasing stack size to some ridiculus number by adding "option
stack=512k" in the wlink script
Don't know if your problem has anything to do with the stack. Only some
short commenting here.
Is 512k really ridiculous? In linux the default stack size seems to be 8m.

In Watcom under Windows you may also add a COMMIT wlink directive. Otherwise
you may get only the default stack size of 64k (I think) committed.
Depending on your stack access behaviour and your compiler switches this
could lead to an access violation.
Something like:
OP st=512k COM st=512k
Jens Staal
2014-12-18 09:35:22 UTC
Permalink
Post by Uwe Schmelich
Don't know if your problem has anything to do with the stack. Only some
short commenting here.
Is 512k really ridiculous? In linux the default stack size seems to be 8m.
In Watcom under Windows you may also add a COMMIT wlink directive.
Otherwise you may get only the default stack size of 64k (I think)
committed. Depending on your stack access behaviour and your compiler
switches this could lead to an access violation.
OP st=512k COM st=512k
Thanks for the suggestion! I was mostly thinking about that the stack is
much bigger than the actual application. Perhaps too simplistic of me.

I tried this but it did not help (but the binary size changed so something
must have happened with the "commit" stuff... so I am keeping it)
Johann Klammer
2014-12-18 06:25:30 UTC
Permalink
Post by Jens Staal
Background: I have ported busybox-w32 (Win32 port of busybox, normally built
with MinGW) so that it can be (cross-) compiled with Open Watcom (v2). The
resulting binary is about 20-30 % smaller (in the upper range of 400 KiB
whereas the MinGW built one is in the upper range of 600 KiB).
Now I hit something where I can not simply guess based on compiler errors so
I am out on deeper water again ;) (I am a hobbyist with no actual knowledge)
All busybox applets seem to work fine (as far as I have tested them) except
the shell. When the shell is executed I always get a stack dump. Below is an
Z:\home\jens\Devel\github\bbuild>busybox sh
The instruction at 0x00402f31 referenced memory at 0x00000044.
This might be a null pointer access into a pointer to some struct, as the 44 seems pretty low...
If I remember correctly stack grows downwards on those intel things,
so if it were that you might have a larger number... Also, AFAIK watcom sprinkles
calls to a stack checking routine into all the function prologues, so you might not
get an access violation at all but some other error message..

I believe you can get at the assembler instruction with either wdis or objdump
or something. I've forgotten the exact procedure but remember that an offset was involved..
Post by Jens Staal
The memory could not be written.
Exception fielded by 0x00456130
EAX=0x00000044 EBX=0x00000004 ECX=0x00476178 EDX=0x00461bfd
ESI=0x00461bfd EDI=0x00000004 EBP=0x00000001 ESP=0x0033fa4c
EIP=0x00402f31 EFL=0x00010206 CS =0x00000023 SS =0x0000002b
DS =0x0000002b ES =0x0000002b FS =0x00000063 GS =0x0000006b
Stack dump (SS:ESP)
0x00000000 0x00342a40 0x0033fd08 0x00000001 0x003412d4 0x00000009
0x00402f74 0x00000000 0x0040c093 0x00000001 0x0033fb4c 0x0033faf8
0x7b84df35 0x0033fc7c 0xf75df8a6 0x0033fab0 0x7b84df35 0xf777c660
0x00000000 0x0033fb4c 0x00000001 0x0033fccc 0x00000001 0x00000000
0x00000000 0xf777c660 0x00000000 0x00000000 0x0033fccc 0x00000000
0x00000000 0x00000001 0x00000000 0x0033fccc 0x0033fb4c 0x00000000
0x2ee3d500 0x00000002 0x7b84dc26 0x0033fb10 0x7ec87000 0x0033fccc
0x00000001 0x0033fba8 0x7ebf8fda 0xf742fca9 0x7bc8e688 0x7ebbd629
0x7ebf8fda 0x00000000 0x00000000 0x0033fb4c 0x00000001 0x0033fccc
0x00000001 0x00000000 0x00000000 0x7ffdf000 0x003400bc 0x0033fb4c
0x00000001 0xfff01000 0x00000400 0x0033fb48 0x00430042 0x02020202
0x536e6957 0x206b636f 0x00302e32 0xf742fca9 0x7ec87000 0x003412d0
I have tried a lot of different things
* A debug build adding -g1+ to the owcc CFLAGS (g2 or higher failed to
compile, also -DDEBUG activating some debug stuff in the code failed to
compile) and "debug watcom all" in the wlink script. This resulted in a huge
binary that I also could run in wdw under Wine. It highlighted an area in
the code, but disabling this part did not do a difference - which makes me
believe that the error might occur "upstream"
- what are your general trick here? Adding printf statements around
suspected areas?
Can be done, yes.
But what you'll want to do, is look at the memory operands(pointers, really)
in the line that wdw shows you, the inspect those pointers for NULL, and
find out where it got set... always assuming wdw actually works...
...I've had some problems with the dos version of that thing before...
Post by Jens Staal
* Since it seemed to be some sort of a memory allocation thing, I tested a
** increasing stack size to some ridiculus number by adding "option
stack=512k" in the wlink script
** passing -Wc,-sg to owcc (also tested with -Wc,-s if it was just due to
some sort of erroneous error)
** switching from "register" to "stack" based calling convention
** based on what I read, the -Wc,-zu seemed interesting. Interestingly, this
one seems to be incompatible with the stack based calling convention
(perhaps obvious for everybody else). I also noticed that variables of type
"char **" will change type to "char * __far*" (which confuses functions if
they are used as arguments) .
So ... probably totally noob questions for most of you guys, but if you have
some pointers or ideas on where to begin that might help :)
Good luck...
Jens Staal
2014-12-18 11:39:27 UTC
Permalink
Post by Johann Klammer
But what you'll want to do, is look at the memory operands(pointers,
really) in the line that wdw shows you, the inspect those pointers for
NULL, and find out where it got set... always assuming wdw actually
works... ...I've had some problems with the dos version of that thing
before...
This was very useful advice. After some clicking to go out of "main" and
into the relevant module I enter this piece of code in ash.c (which is the
shell in busybox)

if (val) {
*p++ = '=';
p = (char) memcpy(p, val, vallen) + vallen;
}

so I should probably look closer at that...
Johann Klammer
2014-12-18 11:54:11 UTC
Permalink
Post by Jens Staal
Post by Johann Klammer
But what you'll want to do, is look at the memory operands(pointers,
really) in the line that wdw shows you, the inspect those pointers for
NULL, and find out where it got set... always assuming wdw actually
works... ...I've had some problems with the dos version of that thing
before...
This was very useful advice. After some clicking to go out of "main" and
into the relevant module I enter this piece of code in ash.c (which is the
shell in busybox)
if (val) {
*p++ = '=';
p = (char) memcpy(p, val, vallen) + vallen;
^^^^^Should that not be a pointer type?
Post by Jens Staal
}
so I should probably look closer at that...
Jens Staal
2014-12-18 13:06:52 UTC
Permalink
Post by Johann Klammer
Post by Jens Staal
p = (char) memcpy(p, val, vallen) + vallen;
^^^^^Should that not be a pointer type?
Post by Jens Staal
}
I realized that I had introduced this (char) here early in the porting
process before I knew about the "allow arimetic operations on void *" (-Wc,-
zev) simply to get the compile to continue.

... so the crash might have been my fault all along ...

Currently compiling with -Wc,-zev in CFLAGS and I have removed (char) before
memcpy at various places...

If this works I will feel very stupid and then I will start simplifying /
cleaning up various compile flags and wlink script tricks that have
accumulated while trying to solve this.
It is some comfort that I did learn stuff along the way.

....
While writing this the compile had finnished and the shell does work!
Thanks a lot for your help!

... and now I feel stupid for not looking at that earlier...
Wilton Helm
2014-12-23 00:58:25 UTC
Permalink
It probably would have worked for you IF you had typecast it to (char *)
instead of (char).
Typecasting it to (char) means that only the least significant byte of the
return value would be used, throwing away the upper bits of the pointer,
which, of course would now point to who knows where.

Wilton
Post by Jens Staal
Post by Johann Klammer
Post by Jens Staal
p = (char) memcpy(p, val, vallen) + vallen;
^^^^^Should that not be a pointer type?
Post by Jens Staal
}
I realized that I had introduced this (char) here early in the porting
process before I knew about the "allow arimetic operations on void *" (-Wc,-
zev) simply to get the compile to continue.
... so the crash might have been my fault all along ...
Currently compiling with -Wc,-zev in CFLAGS and I have removed (char) before
memcpy at various places...
If this works I will feel very stupid and then I will start simplifying /
cleaning up various compile flags and wlink script tricks that have
accumulated while trying to solve this.
It is some comfort that I did learn stuff along the way.
....
While writing this the compile had finnished and the shell does work!
Thanks a lot for your help!
... and now I feel stupid for not looking at that earlier...
Jens Staal
2015-01-06 15:46:39 UTC
Permalink
Post by Wilton Helm
It probably would have worked for you IF you had typecast it to (char *)
instead of (char).
yeah that was stupid...

Now I have another weird little thing that I have identified with the "wd"
debugger

This piece of code causes a SIGSEGV in an intermediate binary ("conf") built
in order to build various configuration files for busybox. I am currently
trying to enable Watcom to also be able to act as HOSTCC and not only as CC
for busybox itself. In the future I will probably also have to trace down
and add the '\' case to enable builds on a Windows host...


FILE *f;
const char *name;
char *env;


if (!f && name[0] != '/') {
env = getenv(SRCTREE);
if (env) {
char *fullname = alloca(strlen(env) + strlen(name) +
2);
sprintf(fullname, "%s/%s", env, name);
f = fopen(fullname, "r");
}
}


anyone got an idea why this one SIGSEGVs in Watcom and not in GCC? In wd (on
Linux), "f" and "name" where NULL and env had a value. I have tried
replacing alloca with malloc and I have tried adding (char *) before without
any change.

commenting out this piece of code makes a binary that does not crash, but it
can also not find the relevant configuration file.
Jens Staal
2015-01-06 17:06:50 UTC
Permalink
Post by Jens Staal
commenting out this piece of code makes a binary that does not crash, but
it can also not find the relevant configuration file.
OK noticed that using this binary and explicitly pointing it to a
configuration file also caused a core dump, so I probably just masked the
real problem...

More digging to do...
Paul S. Person
2015-01-06 17:28:50 UTC
Permalink
Post by Jens Staal
Post by Wilton Helm
It probably would have worked for you IF you had typecast it to (char *)
instead of (char).
yeah that was stupid...
Now I have another weird little thing that I have identified with the "wd"
debugger
This piece of code causes a SIGSEGV in an intermediate binary ("conf") built
in order to build various configuration files for busybox. I am currently
trying to enable Watcom to also be able to act as HOSTCC and not only as CC
for busybox itself. In the future I will probably also have to trace down
and add the '\' case to enable builds on a Windows host...
FILE *f;
const char *name;
char *env;
if (!f && name[0] != '/') {
env = getenv(SRCTREE);
if (env) {
char *fullname = alloca(strlen(env) + strlen(name) +
2);
sprintf(fullname, "%s/%s", env, name);
f = fopen(fullname, "r");
}
}
anyone got an idea why this one SIGSEGVs in Watcom and not in GCC? In wd (on
Linux), "f" and "name" where NULL and env had a value. I have tried
replacing alloca with malloc and I have tried adding (char *) before without
any change.
If "name" is NULL, how can name[0] not cause a GP-fault?
--
"Nature must be explained in
her own terms through
the experience of our senses."
d3x0r
2015-01-07 02:13:14 UTC
Permalink
Post by Jens Staal
Post by Wilton Helm
It probably would have worked for you IF you had typecast it to (char *)
instead of (char).
yeah that was stupid...
Now I have another weird little thing that I have identified with the "wd"
debugger
This piece of code causes a SIGSEGV in an intermediate binary ("conf") built
in order to build various configuration files for busybox. I am currently
trying to enable Watcom to also be able to act as HOSTCC and not only as CC
for busybox itself. In the future I will probably also have to trace down
and add the '\' case to enable builds on a Windows host...
FILE *f;
const char *name;
char *env;
if (!f && name[0] != '/') {
env = getenv(SRCTREE);
if (env) {
char *fullname = alloca(strlen(env) + strlen(name) +
2);
this is initialized before any code runs... If you have not assigned env
or name this will use uninitialized data and fail. It's not done
in-line... but rather at routine init time.
Post by Jens Staal
sprintf(fullname, "%s/%s", env, name);
f = fopen(fullname, "r");
}
}
anyone got an idea why this one SIGSEGVs in Watcom and not in GCC? In wd (on
Linux), "f" and "name" where NULL and env had a value. I have tried
replacing alloca with malloc and I have tried adding (char *) before without
any change.
commenting out this piece of code makes a binary that does not crash, but it
can also not find the relevant configuration file.
--
Using Opera's mail client: http://www.opera.com/mail/
d3x0r
2015-01-07 02:16:37 UTC
Permalink
Post by d3x0r
Post by Jens Staal
FILE *f;
const char *name;
char *env;
if (!f && name[0] != '/') {
env = getenv(SRCTREE);
if (env) {
char *fullname = alloca(strlen(env) + strlen(name) +
2);
this is initialized before any code runs... If you have not assigned env
or name this will use uninitialized data and fail. It's not done
in-line... but rather at routine init time.
eh; it really has to be done inline doesn't it...
Post by d3x0r
Post by Jens Staal
sprintf(fullname, "%s/%s", env, name);
f = fopen(fullname, "r");
}
}
anyone got an idea why this one SIGSEGVs in Watcom and not in GCC? In wd (on
Linux), "f" and "name" where NULL and env had a value. I have tried
replacing alloca with malloc and I have tried adding (char *) before without
any change.
commenting out this piece of code makes a binary that does not crash, but it
can also not find the relevant configuration file.
--
Using Opera's mail client: http://www.opera.com/mail/
Lynn McGuire
2015-01-06 18:23:45 UTC
Permalink
Post by Jens Staal
Post by Johann Klammer
But what you'll want to do, is look at the memory operands(pointers,
really) in the line that wdw shows you, the inspect those pointers for
NULL, and find out where it got set... always assuming wdw actually
works... ...I've had some problems with the dos version of that thing
before...
This was very useful advice. After some clicking to go out of "main" and
into the relevant module I enter this piece of code in ash.c (which is the
shell in busybox)
if (val) {
*p++ = '=';
p = (char) memcpy(p, val, vallen) + vallen;
}
so I should probably look closer at that...
This is why I hate typecasting. Typecasting is basically "out of the frying pan and into the fire".

Lynn
Continue reading on narkive:
Loading...