[nasm:master] doc: document preprocessor functions
nasm-bot for H. Peter Anvin
hpa at zytor.com
Fri Nov 11 20:27:04 PST 2022
Commit-ID: 392b2b18a0196810fec33cf02c7e527044cf211c
Gitweb: http://repo.or.cz/w/nasm.git?a=commitdiff;h=392b2b18a0196810fec33cf02c7e527044cf211c
Author: H. Peter Anvin <hpa at zytor.com>
AuthorDate: Fri, 11 Nov 2022 20:25:49 -0800
Committer: H. Peter Anvin <hpa at zytor.com>
CommitDate: Fri, 11 Nov 2022 20:25:49 -0800
doc: document preprocessor functions
Add documentation for preprocessor functions, as well as the flow of
preprocessor expansion.
Signed-off-by: H. Peter Anvin <hpa at zytor.com>
---
doc/changes.src | 4 +
doc/nasmdoc.src | 302 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 289 insertions(+), 17 deletions(-)
diff --git a/doc/changes.src b/doc/changes.src
index 4c45f18b..accc5058 100644
--- a/doc/changes.src
+++ b/doc/changes.src
@@ -20,6 +20,10 @@ filename information anyway.
\b Fix handling of MASM-syntax reserved memory (e.g. \c{dw ?}) when
used in structure definitions.
+\b The preprocessor now supports functions, which can be less verbose
+and more convenient than the equivalent code implemented using
+directives. See \k{ppfunc}.
+
\S{cl-2.15.06} Version 2.15.06
diff --git a/doc/nasmdoc.src b/doc/nasmdoc.src
index f2756072..d850df9e 100644
--- a/doc/nasmdoc.src
+++ b/doc/nasmdoc.src
@@ -1,7 +1,7 @@
\# --------------------------------------------------------------------------
\#
\# Copyright 1996-2022 The NASM Authors - All Rights Reserved
-\M{year}{1996-2020}
+\M{year}{1996-2022}
\# See the file AUTHORS included with the NASM distribution for
\# the specific copyright holders.
\#
@@ -84,7 +84,8 @@
\IR{-w} \c{-w} option
\IR{-Z} \c{-Z} option
\IR{!=} \c{!=} operator
-\IR{$, here} \c{$}, Here token
+\IR{$, here} \c{$}, current address
+\IR{$, here} here token
\IR{$, prefix} \c{$}, prefix
\IR{$$} \c{$$} token
\IR{%} \c{%} operator
@@ -118,7 +119,6 @@
\IR{^^} \c{^^} operator
\IR{|} \c{|} operator
\IR{||} \c{||} operator
-\IR{~} \c{~} operator
\IR{%$} \c{%$} and \c{%$$} prefixes
\IA{%$$}{%$}
\IR{+ opaddition} \c{+} operator, binary
@@ -127,6 +127,8 @@
\IR{- opsubtraction} \c{-} operator, binary
\IR{- opunary} \c{-} operator, unary
\IR{! opunary} \c{!} operator
+\IA{~}{~ opunary}
+\IR{~ opunary} \c{~} operator
\IA{A16}{a16}
\IA{A32}{a32}
\IA{A64}{a64}
@@ -153,12 +155,16 @@ variables
\IR{c calling convention} C calling convention
\IR{c symbol names} C symbol names
\IA{critical expressions}{critical expression}
-\IA{command line}{command-line}
+\IA{command-line}{command line}
+\IA{comments}{comment}
+\IR{ccomment} comment, ending in \c{\\}
\IA{case sensitivity}{case sensitive}
\IA{case-sensitive}{case sensitive}
\IA{case-insensitive}{case sensitive}
\IA{character constants}{character constant}
\IR{codeview debugging format} CodeView debugging format
+\IR{continuation line} continuation line
+\IR{continuation line} preprocessor, continuation line
\IR{common object file format} Common Object File Format
\IR{common variables, alignment in elf} common variables, alignment in ELF
\IR{common, elf extensions to} \c{COMMON}, ELF extensions to
@@ -170,8 +176,8 @@ variables
\IR{dll symbols, exporting} DLL symbols, exporting
\IR{dll symbols, importing} DLL symbols, importing
\IR{dos} DOS
-\IA{effective address}{effective addresses}
-\IA{effective-address}{effective addresses}
+\IA{effective addresses}{effective address}
+\IA{effective-address}{effective address}
\IR{elf} ELF
\IR{elf, 16-bit code} ELF, 16-bit code
\IR{elf, debug formats} ELF, debug formats
@@ -241,9 +247,13 @@ variables
\IR{plt} PLT
\IR{plt} \c{PLT} relocations
\IA{pre-defining macros}{pre-define}
-\IA{preprocessor expressions}{preprocessor, expressions}
-\IA{preprocessor loops}{preprocessor, loops}
-\IA{preprocessor variables}{preprocessor, variables}
+\IR{preprocessor conditionals} preprocessor, conditionals
+\IR{preprocessor expansions} preprocessor, expansions
+\IR{preprocessor expressions} preprocessor, expressions
+\IR{preprocessor loops} preprocessor, loops
+\IR{preprocessor variables} preprocessor, variables
+\IR{preprocessor variables} variables, preprocessor
+\IA{comments}{comment}
\IR{relocations, pic-specific} relocations, PIC-specific
\IA{repeating}{repeating code}
\IR{section alignment, in elf} section alignment, in ELF
@@ -1164,9 +1174,9 @@ is a macro, a preprocessor directive or an assembler directive: see
\c label: instruction operands ; comment
As usual, most of these fields are optional; the presence or absence
-of any combination of a label, an instruction and a comment is allowed.
-Of course, the operand field is either required or forbidden by the
-presence and nature of the instruction field.
+of any combination of a label, an instruction and a \i{comment} is
+allowed. Of course, the operand field is either required or forbidden
+by the presence and nature of the instruction field.
NASM uses backslash (\\) as the line continuation character; if a line
ends with backslash, the next line is considered to be a part of the
@@ -2166,10 +2176,23 @@ NASM contains a powerful \i{macro processor}, which supports
conditional assembly, multi-level file inclusion, two forms of macro
(single-line and multi-line), and a `context stack' mechanism for
extra macro power. Preprocessor directives all begin with a \c{%}
-sign.
+sign. As a result, some care needs to be taken when using the \c{%}
+arithmetic operator to avoid it being confused with a preprocessor
+directive; it is recommended that it always be surrounded by
+whitespace.
-The preprocessor collapses all lines which end with a backslash (\\)
-character into a single line. Thus:
+The NASM preprocessor borrows concepts from both the C preprocessor
+and the macro facilities of many other assemblers.
+
+\H{pcsteps} \i{Preprocessor Expansions}
+
+The input to the preprocessor is expanded in the following ways in the
+order specified here.
+
+\S{pcbackslash} \i{Continuation Line} Collapsing
+
+The preprocessor first collapses all lines which end with a backslash
+(\c{\\}) character into a single line. Thus:
\c %define THIS_VERY_LONG_MACRO_NAME_IS_DEFINED_TO \\
\c THIS_VALUE
@@ -2177,8 +2200,122 @@ character into a single line. Thus:
will work like a single-line macro without the backslash-newline
sequence.
+\IR{comment removal} comment, removal
+\IR{comment removal} preprocessor, comment removal
+
+\S{pccomment} \i{Comment Removal}
+
+After concatenation, comments are removed.
+\I{comment, syntax}\i{Comments}
+begin with the character \c{;} unless contained
+inside a quoted string or a handful of other special contexts.
+
+\I{ccomment}Note that this is applied \e{after} \i{continuation lines}
+are collapsed. This means that
+
+\c add al,'\\' ; Add the ASCII code for \\
+\c mov [ecx],al ; Save the character
+
+will probably not do what you expect, as the second line will be
+considered part of the preceeding comment. Although this behavior is
+sometimes confusing, it is both the behavior of NASM since the very
+first version as well as the behavior of the C preprocessor.
+
+
+\S{pcline}\i\c{%line} directives
+
+In this step, \i\c{%line} directives are processed. See \k{line}.
+
+
+\S{pccond}\I{preprocessor conditionals}\I{preprocessor loops}
+Conditionals, Loops and \i{Multi-Line Macro} Definitions
+
+In this step, the following \i{preprocessor directives} are processed:
+
+\b \i{Multi-line macro} definitions, specified by the \i\c{%macro} and
+\i\c{%imacro} directives. The body of a multi-line macro is stored and
+is not further expanded at this time. See \k{mlmacro}.
+
+\b \i{Conditional assembly}, specified by the \i\c{%if} family of preprocessor
+directives. Disabled part of the source code are discarded and are not
+futher expanded. See \k{condasm}.
+
+\b \i{Preprocessor loops}, specified by the \i\c{%rep} preprocessor
+directive. A preprocessor loop is very similar to a multi-line macro
+and as such the body is stored and is not futher expanded at this
+time. See \k{rep}.
+
+These constructs are required to be balanced, so that the ending of a
+block can be detected, but no further processing is done at this time;
+stored blocks will be inserted at this step when they are expanded
+(see below.)
+
+It is specific to each directive to what extent \i{inline expansions}
+and \i{detokenization} are performed for the arguments of the
+directives.
+
+
+\S{pcsmacro} \i{Inline expansions} and other \I{preprocessor directives}directives
+
+In this step, the following expansions are performed on each line:
+
+\b \i{Single-line macros} are expanded. See \k{slmacro}.
+
+\b \i{Preprocessor functions} are expanded. See \k{ppfunc}.
+
+\b If this line is the result of \i{multi-line macro} expansions (see
+below), the parameters to that macro are expanded at this time. See
+\k{mlmacro}.
+
+\b \i{Macro indirection}, using the \i\c{%[]} construct, is expanded. See
+\k{indmacro}.
+
+\b Token \i{concatenation} using either the \i\c{%+} operator (see
+\k{concat%+}) or implicitly (see \k{indmacro} and \k{concat}.)
+
+\b \i{Macro-local labels} are converted into unique strings, see
+\k{maclocal}.
+
+\b Remaining preprocessor \i{directives} are processed. It is specific
+to each directive to what extend the above expansions or the ones
+specified in \k{pcfinal} are performed on their arguments.
+
+
+\S{pcmmacro} \i{Multi-Line Macro Expansion}
+
+In this step, \i{multi-line macros} are expanded into new lines of
+source, like the typical macro feature of many other assemblers. See
+\k{mlmacro}.
+
+After expansion, the newly injected lines of source are processed
+starting with the step defined in \k{pccond}.
+
+
+\S{pcfinal} \i{Detokenization}
+
+In this step, the final line of source code is produced. It performs
+the following operations:
+
+\b Environment variables specified using the \i\c{%!} construct are
+expanded. See \k{ctxlocal}.
+
+\b \i{Context-local labels} are expanded into unique strings. See
+\k{ctxlocal}.
+
+\b All tokens are converted to their text representation. Unlike the C
+preprocessor, the NASM preprocessor does not insert whitespace between
+adjacent tokens unless present in the source code. See \k{concat}.
+
+The resulting line of text either is sent to the assembler, or, if
+running in preprocessor-only mode, to the output file (see \k{opt-E});
+if necessary prefixed by a newly inserted \i\c{%line} directive.
+
+
\H{slmacro} \i{Single-Line Macros}
+Single-line macros are expanded inline, much like macros in the C
+preprocessor.
+
\S{define} The Normal Way: \I\c{%idefine}\i\c{%define}
Single-line macros are defined using the \c{%define} preprocessor
@@ -2528,6 +2665,8 @@ The expression passed to \c{%assign} is a \i{critical expression}
a relocatable reference such as a code or data address, or anything
involving a register).
+See also the \i\c{%eval()} preprocessor function, \k{f_eval}.
+
\S{defstr} Defining Strings: \I\c{%idefstr}\i\c{%defstr}
@@ -2549,6 +2688,8 @@ This can be used, for example, with the \c{%!} construct (see
\c %defstr PATH %!PATH ; The operating system PATH variable
+See also the \i\c{%str()} preprocessor function, \k{f_str}.
+
\S{deftok} Defining Tokens: \I\c{%ideftok}\i\c{%deftok}
@@ -2564,6 +2705,8 @@ is equivalent to
\c %define test TEST
+See also the \i\c{%tok()} preprocessor function, \k{f_tok}.
+
\S{defalias} Defining Aliases: \I\c{%idefalias}\i\c{%defalias}
@@ -2628,6 +2771,9 @@ or a numeric value) to a single-line macro. When producing a string
value, it may change the style of quoting of the input string or
strings, and possibly use \c{\\}-escapes inside \c{`}-quoted strings.
+These directives are also available as \i{preprocessor functions}, see
+\k{ppfunc}.
+
\S{strcat} \i{Concatenating Strings}: \i\c{%strcat}
The \c{%strcat} operator concatenates quoted strings and assign them to
@@ -2646,6 +2792,9 @@ Similarly:
The use of commas to separate strings is permitted but optional.
+The corresponding preprocessor function is \c{%strcat()}, see
+\k{f_strcat}.
+
\S{strlen} \i{String Length}: \i\c{%strlen}
@@ -2665,6 +2814,9 @@ macro that expands to a string, as in the following example:
As in the first case, this would result in \c{charcnt} being
assigned the value of 9.
+The corresponding preprocessor function is \c{%strlen()}, see
+\k{f_strlen}.
+
\S{substr} \i{Extracting Substrings}: \i\c{%substr}
@@ -2689,11 +2841,126 @@ values out of range result in an empty string. A negative length
means "until N-1 characters before the end of string", i.e. \c{-1}
means until end of string, \c{-2} until one character before, etc.
+The corresponding preprocessor function is \c{%substr()}, see
+\k{f_substr}.
+
+
+\H{ppfunc} \i{Preprocessor Functions}
+
+Preprocessor functions are, fundamentally, a kind of built-in
+single-line macros. They expand to a string depending on its
+arguments, and can be used in any context where single-line macro
+expansion would be performed. Preprocessor functions were introduced
+in NASM 2.16.
+
+\S{f_eval} \i\c{%eval()} Function
+
+The \c{%eval()} function evaluates its argument as a numeric
+expression in much the same way the \i\c{%assign} directive would, see
+\k{assign}. Unlike \c{%assign}, \c{%eval()} supports more than one
+argument; if more than one argument is specified, it is expanded to a
+comma-separated list of values.
+
+\c %assign a 2
+\c %assign b 3
+\c %defstr what %expr(a+b,a*b) ; equivalent to %define what "5,6"
+
+The expressions passed to \c{%eval()} are \i{critical expressions},
+see \k{crit}.
+
+
+\S{f_is} \i\c{%is()} Family Functions
+
+Each \i\c{%if} family directive (see \k{condasm}) has an equivalent
+\c{%is()} family function, that expands to \c{1} if the equivalent
+\c{%if} directive would process as true, and \c{0} if the equivalent
+\c{%if} directive would process as false.
+
+\c ; Instead of !%isidn() could have used %isnidn()
+\c %if %isdef(foo) && !%isidn(foo,bar)
+\c db "foo is defined, but not as 'bar'"
+\c %endif
+
+Note that, being functions, the arguments (before expansion) will
+always need to have balanced parentheses so that the end of the
+argument list can be defined. This means that the syntax of
+e.g. \c{%istoken()} and \c{%isidn()} is somewhat stricter than their
+corresponding \c{%if} directives; it may be necessary to escape the
+argument to the conditional using \c{\{\}}:
+
+\c ; Instead of !%isidn() could have used %isnidn()
+\c %if %isdef(foo) && !%isidn({foo,)})
+\c db "foo is defined, but not as ')'"
+\c %endif
+
+
+\S{f_str} \i\c\{%str()} Function
+
+The \c{%str()} function converts its argument, including any commas,
+to a quoted string, similar to the way the \i\c{%defstr} directive
+would, see \k{defstr}.
+
+Being a function, the argument will need to have balanced parentheses
+or be escaped using \c{\{\}}.
+
+\c ; The following lines are all equivalent
+\c %define test 'TEST'
+\c %defstr test TEST
+\c %define test %str(TEST)
+
+
+\S{f_strcat} \i\c\{%strcat()} Function
+
+The \c{%strcat()} function concatenates a list of quoted strings, in
+the same way the \i\c{%strcat} directive would, see \k{strcat}.
+
+\c ; The following lines are all equivalent
+\c %define alpha 'Alpha: 12" screen'
+\c %strcat alpha "Alpha: ", '12" screen'
+\c %define alpha %strcat("Alpha: ", '12" screen')
+
+
+\S{f_strlen} \i\c{%strlen()} Function
+
+The \c{%strlen()} function expands to the length of a quoted string,
+in the same way the \i\c{%strlen} directive would, see \k{strlen}.
+
+\c ; The following lines are all equivalent
+\c %define charcnt 9
+\c %strlen charcnt 'my string'
+\c %define charcnt %strlen('my string')
+
+
+\S{f_substr} \i\c\{%substr()} Function
+
+The \c{%substr()} function extracts a substring of a quoted string, in
+the same way the \i\c{%substr} directive would, see \k{substr}. Note
+that unlike the \c{%substr} directive, a comma is required after the
+string argument.
+
+\c ; The following lines are all equivalent
+\c %define mychar 'yzw'
+\c %substr mychar 'xyzw' 2,-1
+\c %define mychar %substr('xyzw',2,-1)
+
+
+\S{f_tok} \i\c{%tok()} function
+
+The \c{%tok()} function converts a quoted string into a sequence of
+tokens, in the same way the \i\c{%deftok} directive would, see
+\k{deftok}.
+
+\c ; The following lines are all equivalent
+\c %define test TEST
+\c %deftok test 'TEST'
+\c %define test %tok('TEST')
+
\H{mlmacro} \i{Multi-Line Macros}: \I\c{%imacro}\i\c{%macro}
-Multi-line macros are much more like the type of macro seen in MASM
-and TASM: a multi-line macro definition in NASM looks something like
+Multi-line macros much like the type of macro seen in MASM
+and TASM, and expand to a new set of lines of source code.
+A multi-line macro definition in NASM looks something like
this.
\c %macro prologue 1
@@ -4614,6 +4881,7 @@ It is still possible to turn in on again by
Note that \c{SECTALIGN <ON|OFF>} affects only the \c{ALIGN}/\c{ALIGNB} directives,
not an explicit \c{SECTALIGN} directive.
+
\C{macropkg} \i{Standard Macro Packages}
The \i\c{%use} directive (see \k{use}) includes one of the standard
More information about the Nasm-commits
mailing list