[Nasm-bugs] [Bug 3392533] Considerations for segment support in ELF

Fri Nov 30 15:21:58 PST 2018

https://bugzilla.nasm.us/show_bug.cgi?id=3392533

--- Comment #7 from H. Peter Anvin <hpa at zytor.com> ---
Trying again... writeup for segmented code support in ELF.

This is intended to address the general case. It is possible to
support subcases (Syslinux, wraplinux, and the Linux kernel all do
exactly this); in particular the tiny memory model is trivial to
support and the small model just requires a minimal linker script.
Doing mixed 16- and 32-bit code with a nonzero base (as required by
the Linux kernel), requires a fair bit of external machinery, though,
and it is very much not pretty.

The general case has to consider:

- Global symbols across segments;
- The SEG and WRT operators;
- Groups (a set of sections sharing a common segment base, which are
  expected to be co-located by the linker);
- HUGE segment (larger than 64K), which cannot have a fixed segment
  base.

This requires at a minimum:

a) The notion of the base of a symbol, including global symbols;
b) The notion of the base of a relocation, which may or may not be the
   same as the base of a symbol;
c) A segment relocation.

The most common case is that the segment base will be the base of an
*output* section. This can also apply to groups simply by defining the
group as either an empty section at the beginning of the
group. However, the segment base is not *required* to be so, and can
also be an arbitrary address.

I believe the sanest way to deal with this is to associate another
symbol and offset (which may be the symbol itself) with each symbol,
using the equivalent of an SHT_SYMTAB_SHNDX section.  This section
could then contain something like 

   32 bit - extended section index
   32 bit - associated symbol index
32/64 bit - offset from associated symbol

The linker would have to produce a standardized symbol at the start of
each section, which I understand it currently does under very limited
circumstances which may be a bit too restrictive; I am not actually sure.

The linker script syntax may need to be extended to support a wildcard
match of input sections that should be put in *separate* output
sections, with the associated base symbols, but at a particular point
in the output image. This is because the normal ordering in a
segmented program is:

        _TEXT_*
        _RODATA_* _DATA_*       /* sorted by *, so grouped in pairs */
        DGROUP {
                _TEXT
                _RODATA
                _DATA
                _BSS
                _STACK          /* may or may not be part of DGROUP */
        }
        _BSS_*

That being said, this is frequently simplified by putting read-only
data into the corresponding _DATA section, as there is no memory
protection in classical segmented mode anyway.

I'm not sure if SHT_GROUP can be used for the purpose of the grouping;
it seems to have a completely different meaning in ELF.

I am completely disregarding 16-bit protected mode in this analysis,
and focusing on real mode (and V86 mode, as it has real-mode-like
addressing.)

Let Q(S) symbolize the value of the symbol associated with S.

Now we need, at a minimum:

R_*_SEG         word16          (Q(S) + A) >> 4

R_*_SEGOFF16    word16          S - Q(S) + A
R_*_SEGOFF32    word32          S - Q(S) + A

Optional (I *think* these can be handled in the assembler):

R_*_SEGHUGE     word16          (S - Q(S) + A) >> 4
R_*_SEGOFFHUGE  word8           (S - Q(S) + A) & 0xf

The SEGOFFHUGE only needs to be 8 bits as the rest is a
zero-extension.

For the elf-i386 format, R_386_SEGHUGE presents a problem, since the
expression S + Q(S) + A may not be aligned to 16 bytes. It thus
requires RELA relocations rather than REL relocations, but elf-i386
normally uses REL relocations.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are watching all bug changes.