[Nasm-bugs] [Bug 3392783] New: obj output format may discard offsets in segment-relative constants

noreply-nasm at dev.nasm.us noreply-nasm at dev.nasm.us
Fri Sep 17 08:01:51 PDT 2021


https://bugzilla.nasm.us/show_bug.cgi?id=3392783

            Bug ID: 3392783
           Summary: obj output format may discard offsets in
                    segment-relative constants
           Product: NASM
           Version: 2.16 (development)
          Hardware: All
                OS: All
            Status: OPEN
          Severity: normal
          Priority: Medium
         Component: Assembler
          Assignee: nobody at nasm.us
          Reporter: david at bamsoftware.com
                CC: chang.seok.bae at intel.com, gorcunov at gmail.com,
                    hpa at zytor.com, nasm-bugs at nasm.us
     Obtained from: Built from git using configure, From OS distribution

Created attachment 411829
  --> https://bugzilla.nasm.us/attachment.cgi?id=411829&action=edit
Input file that shows constant offsets being discarded in the obj output format

In the obj output format, in certain contexts, a segment-relative constant
(`code+0xaaaa` in the example to follow) is emitted as a relocation pointing at
the constant 0x0000 (not 0xaaaa). In other contexts, such as the target of a
jmp instruction, the constant is emitted as a relocation entry pointing to the
constant 0xaaaa (which is what I expect).

The attached file t.asm shows what I mean. The comments show what each
instruction assembles to, with the ones that are unexpected to me marked with
"??".

bits    16
section code
        jmp     code+0xaaaa     ; -> e9aaaa with reloc
        mov     ax, 0xaaaa      ; -> b8aaaa
        mov     ax, code+0xaaaa ; -> b80000 with reloc ??
        dw      0xaaaa          ; -> aaaa 
        dw      code+0xaaaa     ; -> 0000 with reloc ??

I produced an executable with:

$ nasm -f obj -o t.obj t.asm
$ djlink -o t.exe t.obj
(http://www.delorie.com/djgpp/16bit/djlink/)

The rabin2 program from radare2 (https://book.rada.re/tools/rabin2/intro.html)
shows that the relocation table contains 3 entries, one for each `code+...`,
which is what I expect.

$ rabin2 -R t.exe
[Relocations]
vaddr      paddr      type   name
―――――――――――――――――――――――――――――――――
0x00000001 0x00000201 SET_16
0x00000007 0x00000207 SET_16
0x0000000b 0x0000020b SET_16
3 relocations

If I use `$$+0xaaaa` instead of `code+0xaaaa`, the constant values in the
output file are 0xaaaa as I expect, but no relocation are emitted.

I have tested version 2.14-1 from Debian buster, and commit e2ed7b7e from the
Git repository, with the same behavior.

The following patch results in the output I expect with t.asm, though I imagine
this change is not generally correct, and that a better place to make a change
would be higher in the call stack.

diff --git a/output/legacy.c b/output/legacy.c
index d2785387..9d28faf7 100644
--- a/output/legacy.c
+++ b/output/legacy.c
@@ -90,5 +90,5 @@ void nasm_do_legacy_output(const struct out_data *data)
     case OUT_SEGMENT:
         type = OUT_ADDRESS;
-        dptr = zero_buffer;
+        dptr = &data->toffset;
         size = (data->flags & OUT_SIGNED) ? -data->size : data->size;
         tsegment |= 1;

For comparison, if I write a similar program and use the elf output format, I
get 0xaaaaaaaa in the program text and relocations where needed.

bits    32
section .text
global  _start
_start:
        jmp     0xaaaaaaaa
        jmp     _start+0xaaaaaaaa
        mov     eax, 0xaaaaaaaa
        mov     eax, _start+0xaaaaaaaa
        dd      0xaaaaaaaa
        dd      _start+0xaaaaaaaa

Made into an object file like this:

$ nasm -f elf -o u.o u.asm

Disassembly with relocations marked:

$ objdump -M intel -dr u.o
...
00000000 <_start>:
   0:   e9 a6 aa aa aa          jmp    aaaaaaab <_start+0xaaaaaaab>
                        1: R_386_PC32   *ABS*
   5:   e9 a0 aa aa aa          jmp    aaaaaaaa <_start+0xaaaaaaaa>
   a:   b8 aa aa aa aa          mov    eax,0xaaaaaaaa
   f:   b8 aa aa aa aa          mov    eax,0xaaaaaaaa
                        10: R_386_32    .text
  14:   aa                      stos   BYTE PTR es:[edi],al
  15:   aa                      stos   BYTE PTR es:[edi],al
  16:   aa                      stos   BYTE PTR es:[edi],al
  17:   aa                      stos   BYTE PTR es:[edi],al
  18:   aa                      stos   BYTE PTR es:[edi],al
                        18: R_386_32    .text
  19:   aa                      stos   BYTE PTR es:[edi],al
  1a:   aa                      stos   BYTE PTR es:[edi],al
  1b:   aa                      stos   BYTE PTR es:[edi],al

I came across this issue while writing an EXE-producing program and checking
its support for writing relocations in the header. I have a sample program with
a dw array of constant values, which are also relocation targets. The program
prints the contents of the array, as a quick visual check that the relocations
have been effected. I have been doing this with a custom EXE header writer that
includes a constant relocation table, but having the assembler output the
relocation offsets would be less fragile. I am not sure what I'm doing is the
best or the correct way to do what I want, but in any case the output with obj
is surprising and apparently inconsistent with other output formats.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are watching all bug changes.


More information about the Nasm-bugs mailing list