Discussion:
C++ large data code generation question
(too old to reply)
Michael Brutman
2014-12-28 19:38:02 UTC
Permalink
I was looking at the code that OW 1.9 generates for some code and
noticed something odd. Here is a simpler program that illustrates the
problem:

#include <malloc.h>

class Request {
public:
char *buffer;
int init( void );
};

int Request::init( void ) {
buffer = (char *)malloc( 200 );
if ( buffer == NULL ) return -1;
return 0;
}

int main( int argc, char *argv[] ) {
Request *tmp = new Request( );
int rc = tmp->init( );
return 0;
}

Compile using wcl -0 -ml -s malloc_ptr.cpp . You can use pretty much
any optimization flags that you want, the results won't change. (I
tried a few variants of -o options, including -ox.)

Here is the generated assembler:

0000 int far Request::init():
0000 53 push bx
0001 51 push cx
0002 89 C3 mov bx,ax
0004 89 D1 mov cx,dx
0006 B8 C8 00 mov ax,0x00c8
0009 9A 00 00 00 00 call malloc_
000E 8E D9 mov ds,cx
0010 89 07 mov word ptr [bx],ax
0012 89 57 02 mov word ptr 0x2[bx],dx
0015 8B 17 mov dx,word ptr [bx]
0017 8B 47 02 mov ax,word ptr 0x2[bx]
001A 85 C0 test ax,ax
001C 75 0A jne L$1
001E 85 D2 test dx,dx
0020 75 06 jne L$1
0022 B8 FF FF mov ax,0xffff
0025 59 pop cx
0026 5B pop bx
0027 CB retf
0028 L$1:
0028 31 C0 xor ax,ax
002A 59 pop cx
002B 5B pop bx
002C CB retf

Routine Size: 45 bytes, Routine Base: malloc_ptr_TEXT + 0000

002D main_:
002D 52 push dx
002E B8 04 00 mov ax,0x0004
0031 9A 00 00 00 00 call void far * far operator
new( int unsigned )
0036 0E push cs
0037 E8 00 00 call int far Request::init()
003A 31 C0 xor ax,ax
003C 5A pop dx
003D CB retf

Routine Size: 17 bytes, Routine Base: malloc_ptr_TEXT + 002D

This is an optimization question:

Here is what I think is going on. The call to malloc at offset 0x9
(fmalloc in this case because we are using the large memory model)
returns a pointer in the DX:AX register pair. DX has the segment and AX
has the offset. This is a legal register pair as per the C++ users
guide around page 154.

CX looks to be the the "this" pointer for the object, or some form of
basing pointer. After the call to malloc the newly allocated pointer is
stored in memory by the two mov instructions at 0x10 and 0x12.

Here is the fun part. The next two mov instructions read the segment
and offset in back from memory in reverse order. Even worse, the only
reason they are read back in is to prepare for the NULL pointer check.
The actual contents of the registers or the order in which they are
examined does not matter; all that matters is that both of them are
non-zero.

So the two extra mov instructions at 0x15 and 0x17 are not needed at all.

I see this happen all of the time in my code; a basic NULL pointer check
generates this. I suspect that there is something in large model code
generation that says pointer comparisons must be done in a particular
order and it's forcing the register assignment to be in that order. But
we know that AX:DX is not a valid register pair for a far pointer, but
DX:AX is. So why would the check be coded in the wrong order?

I'm guessing here. I've not dove into the compiler source code yet, and
I'm kind of loathe to do so - that learning curve is not trivial.
Before I do, can anybody shed some light on what is going on? Am I
missing something here? A pointer to the code that generates this code
would be appreciated too.


Thanks,
Mike
d3x0r
2015-01-01 08:04:15 UTC
Permalink
Post by Michael Brutman
I was looking at the code that OW 1.9 generates for some code and
noticed something odd. Here is a simpler program that illustrates the
#include <malloc.h>
class Request {
char *buffer;
int init( void );
};
int Request::init( void ) {
buffer = (char *)malloc( 200 );
if ( buffer == NULL ) return -1;
return 0;
}
int main( int argc, char *argv[] ) {
Request *tmp = new Request( );
int rc = tmp->init( );
return 0;
}
Compile using wcl -0 -ml -s malloc_ptr.cpp . You can use pretty much
any optimization flags that you want, the results won't change. (I
tried a few variants of -o options, including -ox.)
0000 53 push bx
0001 51 push cx
0002 89 C3 mov bx,ax
0004 89 D1 mov cx,dx
0006 B8 C8 00 mov ax,0x00c8
0009 9A 00 00 00 00 call malloc_
000E 8E D9 mov ds,cx
0010 89 07 mov word ptr [bx],ax
0012 89 57 02 mov word ptr 0x2[bx],dx
0015 8B 17 mov dx,word ptr [bx]
0017 8B 47 02 mov ax,word ptr 0x2[bx]
001A 85 C0 test ax,ax
001C 75 0A jne L$1
001E 85 D2 test dx,dx
0020 75 06 jne L$1
0022 B8 FF FF mov ax,0xffff
0025 59 pop cx
0026 5B pop bx
0027 CB retf
0028 31 C0 xor ax,ax
002A 59 pop cx
002B 5B pop bx
002C CB retf
Routine Size: 45 bytes, Routine Base: malloc_ptr_TEXT + 0000
002D 52 push dx
002E B8 04 00 mov ax,0x0004
0031 9A 00 00 00 00 call void far * far operator
new( int unsigned )
0036 0E push cs
0037 E8 00 00 call int far Request::init()
003A 31 C0 xor ax,ax
003C 5A pop dx
003D CB retf
Routine Size: 17 bytes, Routine Base: malloc_ptr_TEXT + 002D
Here is what I think is going on. The call to malloc at offset 0x9
(fmalloc in this case because we are using the large memory model)
returns a pointer in the DX:AX register pair. DX has the segment and AX
has the offset. This is a legal register pair as per the C++ users
guide around page 154.
CX looks to be the the "this" pointer for the object, or some form of
basing pointer. After the call to malloc the newly allocated pointer is
stored in memory by the two mov instructions at 0x10 and 0x12.
Here is the fun part. The next two mov instructions read the segment
and offset in back from memory in reverse order. Even worse, the only
reason they are read back in is to prepare for the NULL pointer check.
The actual contents of the registers or the order in which they are
examined does not matter; all that matters is that both of them are
non-zero.
So the two extra mov instructions at 0x15 and 0x17 are not needed at all.
it's an xchg dx,ax and set in memory....

maybe a reversal happens and register coloring is confused? is there
register coloring for optimization in watcom? no idea what they call it...
Post by Michael Brutman
I see this happen all of the time in my code; a basic NULL pointer check
generates this. I suspect that there is something in large model code
generation that says pointer comparisons must be done in a particular
order and it's forcing the register assignment to be in that order. But
we know that AX:DX is not a valid register pair for a far pointer, but
DX:AX is. So why would the check be coded in the wrong order?
I'm guessing here. I've not dove into the compiler source code yet, and
I'm kind of loathe to do so - that learning curve is not trivial. Before
I do, can anybody shed some light on what is going on? Am I missing
something here? A pointer to the code that generates this code would be
appreciated too.
Thanks,
Mike
--
Using Opera's mail client: http://www.opera.com/mail/
Wilton Helm
2015-01-15 23:27:46 UTC
Permalink
I haven't dived into the compiler either, but I clearly see what you mean.
I would expect the code at 0x15 to say
or ax, dx
jne L$1
I've seen the optimizer to things that cleanly before. If I was hand
optimizing, I would have then moved L$1 to where 0x25 is in the listing (and
I've seen the optimizer do that sort of thing. It knows the register has to
be 0 so doesn't have to load it).

Looking at the code in general I don't see a lot of optimization going on at
all--not sure why.

Wilton
I was looking at the code that OW 1.9 generates for some code and noticed
#include <malloc.h>
class Request {
char *buffer;
int init( void );
};
int Request::init( void ) {
buffer = (char *)malloc( 200 );
if ( buffer == NULL ) return -1;
return 0;
}
int main( int argc, char *argv[] ) {
Request *tmp = new Request( );
int rc = tmp->init( );
return 0;
}
Compile using wcl -0 -ml -s malloc_ptr.cpp . You can use pretty much any
optimization flags that you want, the results won't change. (I tried a
few variants of -o options, including -ox.)
0000 53 push bx
0001 51 push cx
0002 89 C3 mov bx,ax
0004 89 D1 mov cx,dx
0006 B8 C8 00 mov ax,0x00c8
0009 9A 00 00 00 00 call malloc_
000E 8E D9 mov ds,cx
0010 89 07 mov word ptr [bx],ax
0012 89 57 02 mov word ptr 0x2[bx],dx
0015 8B 17 mov dx,word ptr [bx]
0017 8B 47 02 mov ax,word ptr 0x2[bx]
001A 85 C0 test ax,ax
001C 75 0A jne L$1
001E 85 D2 test dx,dx
0020 75 06 jne L$1
0022 B8 FF FF mov ax,0xffff
0025 59 pop cx
0026 5B pop bx
0027 CB retf
0028 31 C0 xor ax,ax
002A 59 pop cx
002B 5B pop bx
002C CB retf
Routine Size: 45 bytes, Routine Base: malloc_ptr_TEXT + 0000
002D 52 push dx
002E B8 04 00 mov ax,0x0004
0031 9A 00 00 00 00 call void far * far operator
new( int unsigned )
0036 0E push cs
0037 E8 00 00 call int far Request::init()
003A 31 C0 xor ax,ax
003C 5A pop dx
003D CB retf
Routine Size: 17 bytes, Routine Base: malloc_ptr_TEXT + 002D
Here is what I think is going on. The call to malloc at offset 0x9
(fmalloc in this case because we are using the large memory model) returns
a pointer in the DX:AX register pair. DX has the segment and AX has the
offset. This is a legal register pair as per the C++ users guide around
page 154.
CX looks to be the the "this" pointer for the object, or some form of
basing pointer. After the call to malloc the newly allocated pointer is
stored in memory by the two mov instructions at 0x10 and 0x12.
Here is the fun part. The next two mov instructions read the segment and
offset in back from memory in reverse order. Even worse, the only reason
they are read back in is to prepare for the NULL pointer check. The actual
contents of the registers or the order in which they are examined does not
matter; all that matters is that both of them are non-zero.
So the two extra mov instructions at 0x15 and 0x17 are not needed at all.
I see this happen all of the time in my code; a basic NULL pointer check
generates this. I suspect that there is something in large model code
generation that says pointer comparisons must be done in a particular
order and it's forcing the register assignment to be in that order. But
we know that AX:DX is not a valid register pair for a far pointer, but
DX:AX is. So why would the check be coded in the wrong order?
I'm guessing here. I've not dove into the compiler source code yet, and
I'm kind of loathe to do so - that learning curve is not trivial. Before I
do, can anybody shed some light on what is going on? Am I missing
something here? A pointer to the code that generates this code would be
appreciated too.
Thanks,
Mike
Loading...