[nasm:master] Document CPU LATEVEX, add CPU EVEX and CPU VEX flags

nasm-bot for H. Peter Anvin hpa at zytor.com
Wed Dec 7 10:54:06 PST 2022
Previous message (by thread): [nasm:master] travis: fix the masmdisp travis test
Next message (by thread): [nasm:master] outieee: fix segfault on empty input
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Commit-ID:  55dc058356abdbf3606c6c0bcd4d52d76cb2e90b
Gitweb:     http://repo.or.cz/w/nasm.git?a=commitdiff;h=55dc058356abdbf3606c6c0bcd4d52d76cb2e90b
Author:     H. Peter Anvin <hpa at zytor.com>
AuthorDate: Wed, 7 Dec 2022 10:04:40 -0800
Committer:  H. Peter Anvin <hpa at zytor.com>
CommitDate: Wed, 7 Dec 2022 10:11:21 -0800

Document CPU LATEVEX, add CPU EVEX and CPU VEX flags

Document CPU LATEVEX and the associated prefixes; add CPU EVEX and CPU
VEX flags to further control encodings.

Fix the error message for invalid encodings due to flags.

Signed-off-by: H. Peter Anvin <hpa at zytor.com>


---
 asm/assemble.c   | 19 ++++++++---
 asm/directiv.c   |  4 ++-
 doc/changes.src  | 21 ++++++++++--
 doc/nasmdoc.src  | 98 +++++++++++++++++++++++++++++++++++++++-----------------
 test/latevex.asm | 42 ++++++++++++++++++++++--
 5 files changed, 145 insertions(+), 39 deletions(-)

diff --git a/asm/assemble.c b/asm/assemble.c
index 1880a282..7eab5ce1 100644
--- a/asm/assemble.c
+++ b/asm/assemble.c
@@ -934,8 +934,12 @@ int64_t assemble(int32_t segment, int64_t start, int bits, insn *instruction)
                 nasm_nonfatal("instruction not supported in %d-bit mode", bits);
                 break;
             case MERR_ENCMISMATCH:
-                nasm_nonfatal("instruction not encodable with %s prefix",
-                              prefix_name(instruction->prefixes[PPS_REX]));
+                if (!instruction->prefixes[PPS_REX]) {
+                    nasm_nonfatal("instruction not encodable without explicit prefix");
+                } else {
+                    nasm_nonfatal("instruction not encodable with %s prefix",
+                                  prefix_name(instruction->prefixes[PPS_REX]));
+                }
                 break;
             case MERR_BADBND:
             case MERR_BADREPNE:
@@ -2552,9 +2556,16 @@ static enum match_result matches(const struct itemplate *itemp,
             return MERR_ENCMISMATCH;
         break;
     default:
-        if (itemp_has(itemp, IF_LATEVEX)) {
-            if (!iflag_test(&cpu, IF_LATEVEX))
+        if (itemp_has(itemp, IF_EVEX)) {
+            if (!iflag_test(&cpu, IF_EVEX))
+                return MERR_ENCMISMATCH;
+        } else if (itemp_has(itemp, IF_VEX)) {
+            if (!iflag_test(&cpu, IF_VEX)) {
                 return MERR_ENCMISMATCH;
+            } else if (itemp_has(itemp, IF_LATEVEX)) {
+                if (!iflag_test(&cpu, IF_LATEVEX) && iflag_test(&cpu, IF_EVEX))
+                    return MERR_ENCMISMATCH;
+            }
         }
         break;
     }
diff --git a/asm/directiv.c b/asm/directiv.c
index 901d35c1..a4f54b4b 100644
--- a/asm/directiv.c
+++ b/asm/directiv.c
@@ -111,7 +111,9 @@ void set_cpu(const char *value)
         { "any", IF_ANY },
         { "all", IF_ANY },
         { "latevex", IF_LATEVEX },
-        { NULL, IF_DEFAULT }    /* End of list */
+        { "evex", IF_EVEX },
+        { "vex", IF_VEX },
+        { NULL, 0 }
     };
 
     if (!value) {
diff --git a/doc/changes.src b/doc/changes.src
index 93d5d3e2..cf610b67 100644
--- a/doc/changes.src
+++ b/doc/changes.src
@@ -68,6 +68,20 @@ reservations (e.g. \c{dw ?}.)
 \b Allow forcing an instruction in 64-bit mode to have a (possibly
 redundant) REX prefix, using the syntax \i\c{\{rex\}} as a prefix.
 
+\b Add a \c{\{vex\}} prefix to enforce VEX (AVX) encoding of an
+instruction, either using the 2- or 3-byte VEX prefixes.
+
+\b The \c{CPU} directive has been augmented to allow control of
+generation of VEX (AVX) versus EVEX (AVX-512) instruction formats, see
+\k{CPU}.
+
+\b Some recent instructions that previously have been only available
+using EVEX encodings are now also encodable using VEX (AVX)
+encodings. For backwards compatibility these encodings are not enabled
+by default, but can be generated either via an explicit \c{\{vex\}}
+prefix or by specifying either \c{CPU LATEVEX} or \c{CPU NOEVEX}; see
+\k{CPU}.
+
 \b Document the already existing \c{%unimacro} directive. See \k{unmacro}.
 
 \b Fix a code range generation bug in the DWARF debug format
@@ -767,9 +781,10 @@ options to indicate whether all relevant branches should be getting
 \c{BND} prefixes.  This is expected to be the normal for use in MPX
 code.
 
-\b Add \c{{evex}}, \c{{vex3}} and \c{{vex2}} instruction prefixes to
-have NASM encode the corresponding instruction, if possible, with an EVEX,
-3-byte VEX, or 2-byte VEX prefix, respectively.
+\b Add \c{\{evex\}}, \c{\{vex3\}} and \c{\{vex2\}} instruction
+prefixes to have NASM encode the corresponding instruction, if
+possible, with an EVEX, 3-byte VEX, or 2-byte VEX prefix,
+respectively.
 
 \b Support for section names longer than 8 bytes in Win32/Win64 COFF.
 
diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src
index bd933db5..12efa926 100644
--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@@ -5594,47 +5594,87 @@ are excluded from the symbol mangling and also not marked as global.
 \H{CPU} \i\c{CPU}: Defining CPU Dependencies
 
 The \i\c{CPU} directive restricts assembly to those instructions which
-are available on the specified CPU.
+are available on the specified CPU. At the moment, it is primarily
+used to enforce unavailable \e{encodings} of instructions, such as
+5-byte jumps on the 8080.
 
-Options are:
+(If someone would volunteer to work through the database and add
+proper annotations to each instruction, this could be greatly
+improved. Please contact the developers to volunteer, see \{contact}.)
 
-\b\c{CPU 8086}          Assemble only 8086 instruction set
+Current CPU keywords are:
 
-\b\c{CPU 186}           Assemble instructions up to the 80186 instruction set
+\b\c{CPU 8086}        - Assemble only 8086 instruction set
 
-\b\c{CPU 286}           Assemble instructions up to the 286 instruction set
+\b\c{CPU 186}         - Assemble instructions up to the 80186 instruction set
 
-\b\c{CPU 386}           Assemble instructions up to the 386 instruction set
+\b\c{CPU 286}         - Assemble instructions up to the 286 instruction set
 
-\b\c{CPU 486}           486 instruction set
+\b\c{CPU 386}         - Assemble instructions up to the 386 instruction set
 
-\b\c{CPU 586}           Pentium instruction set
+\b\c{CPU 486}         - 486 instruction set
 
-\b\c{CPU PENTIUM}       Same as 586
+\b\c{CPU 586}         - Pentium instruction set
 
-\b\c{CPU 686}           P6 instruction set
+\b\c{CPU PENTIUM}     - Same as 586
 
-\b\c{CPU PPRO}          Same as 686
+\b\c{CPU 686}         - P6 instruction set
 
-\b\c{CPU P2}            Same as 686
+\b\c{CPU PPRO}        - Same as 686
 
-\b\c{CPU P3}            Pentium III (Katmai) instruction sets
+\b\c{CPU P2}          - Same as 686
 
-\b\c{CPU KATMAI}        Same as P3
+\b\c{CPU P3}          - Pentium III (Katmai) instruction sets
 
-\b\c{CPU P4}            Pentium 4 (Willamette) instruction set
+\b\c{CPU KATMAI}      - Same as P3
 
-\b\c{CPU WILLAMETTE}    Same as P4
+\b\c{CPU P4}          - Pentium 4 (Willamette) instruction set
 
-\b\c{CPU PRESCOTT}      Prescott instruction set
+\b\c{CPU WILLAMETTE}  - Same as P4
 
-\b\c{CPU X64}           x86-64 (x64/AMD64/Intel 64) instruction set
+\b\c{CPU PRESCOTT}    - Prescott instruction set
 
-\b\c{CPU IA64}          IA64 CPU (in x86 mode) instruction set
+\b\c{CPU X64}         - x86-64 (x64/AMD64/Intel 64) instruction set
 
-All options are case insensitive.  All instructions will be selected
-only if they apply to the selected CPU or lower.  By default, all
-instructions are available.
+\b\c{CPU IA64}        - IA64 CPU (in x86 mode) instruction set
+
+\b\c{CPU DEFAULT}     - All available instructions
+
+\b\c{CPU ALL}	      - All available instructions \e{and flags}
+
+All options are case insensitive.
+
+In addition, optional flags can be specified to modify the instruction
+selections. These can be combined with a CPU declaration or specified
+alone. They can be prefixed by \c{+} (add flag, default), \c{-}
+(remove flag) or \c{*} (set flag to default); these prefixes are
+"sticky", so:
+
+\c      cpu -foo,bar
+
+means remove both the \c{foo} and \c{bar} options.
+
+If prefixed with \c{no}, it inverts the meaning of the flag, but this
+is not sticky, so:
+
+\c      cpu nofoo,bar
+
+means remove the \c{foo} flag but add the \c{bar} flag.
+
+Currently available flags are:
+
+\b\c{EVEX} - Enable generation of EVEX (AVX-512) encoded instructions
+without an explicit \c{\{evex\}} prefix. Default on.
+
+\b\c\{VEX} - Enable generation of VEX (AVX) or XOP encoded
+instructions without an explict \c{\{vex\}} prefix. Default on.
+
+\b\c{LATEVEX} - Enable generation of VEX (AVX) encoding of
+instructions where the VEX instructions forms were introduced
+\e{after} the corresponding EVEX (AVX-512) instruction forms without
+requiring an explicit \c{\{vex\}} prefix. This is implicit if the
+\c{EVEX} flag is disabled and the \c{VEX} flag is enabled. Default
+off.
 
 
 \H{FLOAT} \i\c{FLOAT}: Handling of \I{floating-point, constants}floating-point constants
@@ -5643,19 +5683,19 @@ By default, floating-point constants are rounded to nearest, and IEEE
 denormals are supported.  The following options can be set to alter
 this behaviour:
 
-\b\c{FLOAT DAZ}         Flush denormals to zero
+\b\c{FLOAT DAZ}       - Flush denormals to zero
 
-\b\c{FLOAT NODAZ}       Do not flush denormals to zero (default)
+\b\c{FLOAT NODAZ}     - Do not flush denormals to zero (default)
 
-\b\c{FLOAT NEAR}        Round to nearest (default)
+\b\c{FLOAT NEAR}      - Round to nearest (default)
 
-\b\c{FLOAT UP}          Round up (toward +Infinity)
+\b\c{FLOAT UP}        - Round up (toward +Infinity)
 
-\b\c{FLOAT DOWN}        Round down (toward -Infinity)
+\b\c{FLOAT DOWN}      - Round down (toward -Infinity)
 
-\b\c{FLOAT ZERO}        Round toward zero
+\b\c{FLOAT ZERO}      - Round toward zero
 
-\b\c{FLOAT DEFAULT}     Restore default settings
+\b\c{FLOAT DEFAULT}   - Restore default settings
 
 The standard macros \i\c{__?FLOAT_DAZ?__}, \i\c{__?FLOAT_ROUND?__}, and
 \i\c{__?FLOAT?__} contain the current state, as long as the programmer
diff --git a/test/latevex.asm b/test/latevex.asm
index 0502487c..8d14557d 100644
--- a/test/latevex.asm
+++ b/test/latevex.asm
@@ -1,7 +1,7 @@
 	bits 64
 
 %define YMMWORD yword
-	
+
 	vpmadd52luq	ymm3,ymm1,YMMWORD[rsi]
 	vpmadd52luq	ymm16,ymm1,YMMWORD[32+rsi]
 	vpmadd52luq	ymm17,ymm1,YMMWORD[64+rsi]
@@ -30,4 +30,42 @@
 	vpmadd52luq	ymm17,ymm2,YMMWORD[64+rcx]
 	vpmadd52luq	ymm18,ymm2,YMMWORD[96+rcx]
 	vpmadd52luq	ymm19,ymm2,YMMWORD[128+rcx]
-	
+
+	cpu default
+
+	vpmadd52luq	ymm3,ymm1,YMMWORD[rsi]
+	vpmadd52luq	ymm3,ymm2,YMMWORD[rcx]
+
+	cpu noevex
+
+	vpmadd52luq	ymm3,ymm1,YMMWORD[rsi]
+	vpmadd52luq	ymm3,ymm2,YMMWORD[rcx]
+
+%ifdef ERROR
+	vpmadd52luq	ymm19,ymm2,YMMWORD[128+rcx]
+%endif
+
+	cpu evex,novex,latevex
+
+	vpmadd52luq	ymm3,ymm1,YMMWORD[rsi]
+	vpmadd52luq	ymm3,ymm2,YMMWORD[rcx]
+
+	cpu default
+
+	vaddps		ymm3,ymm1,YMMWORD[rsi]
+	vaddps		ymm3,ymm2,YMMWORD[rcx]
+
+	cpu novex
+
+	vaddps		ymm3,ymm1,YMMWORD[rsi]
+	vaddps		ymm3,ymm2,YMMWORD[rcx]
+
+%ifdef ERROR
+	cpu noevex
+
+	vaddps		ymm3,ymm1,YMMWORD[rsi]
+	vaddps		ymm3,ymm2,YMMWORD[rcx]
+%endif
+
+ {vex}	vaddps		ymm3,ymm1,YMMWORD[rsi]
+ {vex}	vaddps		ymm3,ymm2,YMMWORD[rcx]
Previous message (by thread): [nasm:master] travis: fix the masmdisp travis test
Next message (by thread): [nasm:master] outieee: fix segfault on empty input
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Nasm-commits mailing list