changeset 556:d02f43598ba3

finish String documentation
author Franklin Schmidt <fschmidt@gmail.com>
date Fri, 19 Jun 2015 19:39:41 -0600
parents e25ba7a2e816
children 6268c1ce6ea8
files website/src/manual.html.luan
diffstat 1 files changed, 109 insertions(+), 533 deletions(-) [+]
line wrap: on
line diff
--- a/website/src/manual.html.luan	Fri Jun 19 04:29:06 2015 -0600
+++ b/website/src/manual.html.luan	Fri Jun 19 19:39:41 2015 -0600
@@ -2159,40 +2159,6 @@
 
 
 
-
-<p>
-<hr><h3><a name="pdf-tonumber"><code>tonumber (e [, base])</code></a></h3>
-
-
-<p>
-When called with no <code>base</code>,
-<code>tonumber</code> tries to convert its argument to a number.
-If the argument is already a number or
-a string convertible to a number,
-then <code>tonumber</code> returns this number;
-otherwise, it returns <b>nil</b>.
-
-
-<p>
-The conversion of strings can result in integers or floats,
-according to the lexical conventions of Lua (see <a href="#3.1">&sect;3.1</a>).
-(The string may have leading and trailing spaces and a sign.)
-
-
-<p>
-When called with <code>base</code>,
-then <code>e</code> must be a string to be interpreted as
-an integer numeral in that base.
-The base may be any integer between 2 and 36, inclusive.
-In bases above&nbsp;10, the letter '<code>A</code>' (in either upper or lower case)
-represents&nbsp;10, '<code>B</code>' represents&nbsp;11, and so forth,
-with '<code>Z</code>' representing 35.
-If the string <code>e</code> is not a valid numeral in the given base,
-the function returns <b>nil</b>.
-
-
-
-
 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4>
 
 <p>
@@ -2367,22 +2333,6 @@
 
 
 
-
-<p>
-<hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3>
-Returns the internal numerical codes of the characters <code>s[i]</code>,
-<code>s[i+1]</code>, ..., <code>s[j]</code>.
-The default value for <code>i</code> is&nbsp;1;
-the default value for <code>j</code> is&nbsp;<code>i</code>.
-These indices are corrected
-following the same rules of function <a href="#pdf-string.sub"><code>string.sub</code></a>.
-
-
-<p>
-Numerical codes are not necessarily portable across platforms.
-
-
-
 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (&middot;&middot;&middot;)</tt></a></h4>
 
 <p>
@@ -2411,7 +2361,7 @@
 
 <p>
 Looks for the first match of
-<tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <tt>s</tt>.
+<tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>.
 If it finds a match, then <tt>find</tt> returns the indices of&nbsp;<tt>s</tt>
 where this occurrence starts and ends;
 otherwise, it returns <b>nil</b>.
@@ -2451,7 +2401,7 @@
 <p>
 Returns an iterator function that,
 each time it is called,
-returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>)
+returns the next captures from <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>)
 over the string <tt>s</tt>.
 If <tt>pattern</tt> specifies no captures,
 then the whole match is produced in each call.
@@ -2492,7 +2442,7 @@
 <p>
 Returns a copy of <tt>s</tt>
 in which all (or the first <tt>n</tt>, if given)
-occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) have been
+occurrences of the <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) have been
 replaced by a replacement string specified by <tt>repl</tt>,
 which can be a string, a table, or a function.
 <tt>gsub</tt> also returns, as its second value,
@@ -2560,6 +2510,11 @@
 
 
 
+<h4 <%=heading_options%> ><a name="String.literal"><tt>String.literal (s)</tt></a></h4>
+<p>
+Returns a string which matches the literal string <tt>s</tt> in a regular expression.  This function is simply the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)"><tt>Pattern.quote</tt></a>.
+
+
 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4>
 <p>
 Receives a string and returns a copy of this string with all
@@ -2569,109 +2524,128 @@
 
 
 
-<p>
-<hr><h3><a name="pdf-string.match"><code>string.match (s, pattern [, init])</code></a></h3>
-Looks for the first <em>match</em> of
-<code>pattern</code> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <code>s</code>.
-If it finds one, then <code>match</code> returns
+<h4 <%=heading_options%> ><a name="String.match"><tt>String.match (s, pattern [, init])</tt></a></h4>
+
+<p>
+Looks for the first <i>match</i> of
+<tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>.
+If it finds one, then <tt>match</tt> returns
 the captures from the pattern;
 otherwise it returns <b>nil</b>.
-If <code>pattern</code> specifies no captures,
+If <tt>pattern</tt> specifies no captures,
 then the whole match is returned.
-A third, optional numerical argument <code>init</code> specifies
+A third, optional numerical argument <tt>init</tt> specifies
 where to start the search;
 its default value is&nbsp;1 and can be negative.
 
 
-
-
-<p>
-<hr><h3><a name="pdf-string.pack"><code>string.pack (fmt, v1, v2, &middot;&middot;&middot;)</code></a></h3>
-
-
-<p>
-Returns a binary string containing the values <code>v1</code>, <code>v2</code>, etc.
-packed (that is, serialized in binary form)
-according to the format string <code>fmt</code> (see <a href="#6.4.2">&sect;6.4.2</a>). 
-
-
-
-
-<p>
-<hr><h3><a name="pdf-string.packsize"><code>string.packsize (fmt)</code></a></h3>
-
-
-<p>
-Returns the size of a string resulting from <a href="#pdf-string.pack"><code>string.pack</code></a>
-with the given format.
-The format string cannot have the variable-length options
-'<code>s</code>' or '<code>z</code>' (see <a href="#6.4.2">&sect;6.4.2</a>).
-
-
-
-
-<p>
-<hr><h3><a name="pdf-string.rep"><code>string.rep (s, n [, sep])</code></a></h3>
-Returns a string that is the concatenation of <code>n</code> copies of
-the string <code>s</code> separated by the string <code>sep</code>.
-The default value for <code>sep</code> is the empty string
+<h4 <%=heading_options%> ><a name="String.matches"><tt>String.matches (s, pattern)</tt></a></h4>
+<p>
+Returns a boolean indicating whether the entire string <tt>s</tt> matches <tt>pattern</tt>.
+
+
+
+<h4 <%=heading_options%> ><a name="String.rep"><tt>String.rep (s, n [, sep])</tt></a></h4>
+<p>
+Returns a string that is the concatenation of <tt>n</tt> copies of
+the string <tt>s</tt> separated by the string <tt>sep</tt>.
+The default value for <tt>sep</tt> is the empty string
 (that is, no separator).
-Returns the empty string if <code>n</code> is not positive.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-string.reverse"><code>string.reverse (s)</code></a></h3>
-Returns a string that is the string <code>s</code> reversed.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-string.sub"><code>string.sub (s, i [, j])</code></a></h3>
-Returns the substring of <code>s</code> that
-starts at <code>i</code>  and continues until <code>j</code>;
-<code>i</code> and <code>j</code> can be negative.
-If <code>j</code> is absent, then it is assumed to be equal to -1
+Returns the empty string if <tt>n</tt> is not positive.
+
+
+
+
+<h4 <%=heading_options%> ><a name="String.reverse"><tt>String.reverse (s)</tt></a></h4>
+<p>
+Returns a string that is the string <tt>s</tt> reversed.
+
+
+
+
+<h4 <%=heading_options%> ><a name="String.sub"><tt>String.sub (s, i [, j])</tt></a></h4>
+
+<p>
+Returns the substring of <tt>s</tt> that
+starts at <tt>i</tt>  and continues until <tt>j</tt>;
+<tt>i</tt> and <tt>j</tt> can be negative.
+If <tt>j</tt> is absent, then it is assumed to be equal to -1
 (which is the same as the string length).
 In particular,
-the call <code>string.sub(s,1,j)</code> returns a prefix of <code>s</code>
-with length <code>j</code>,
-and <code>string.sub(s, -i)</code> returns a suffix of <code>s</code>
-with length <code>i</code>.
+the call <tt>string.sub(s,1,j)</tt> returns a prefix of <tt>s</tt>
+with length <tt>j</tt>,
+and <tt>string.sub(s, -i)</tt> returns a suffix of <tt>s</tt>
+with length <tt>i</tt>.
 
 
 <p>
 If, after the translation of negative indices,
-<code>i</code> is less than 1,
+<tt>i</tt> is less than 1,
 it is corrected to 1.
-If <code>j</code> is greater than the string length,
+If <tt>j</tt> is greater than the string length,
 it is corrected to that length.
 If, after these corrections,
-<code>i</code> is greater than <code>j</code>,
+<tt>i</tt> is greater than <tt>j</tt>,
 the function returns the empty string.
 
 
 
-
-<p>
-<hr><h3><a name="pdf-string.unpack"><code>string.unpack (fmt, s [, pos])</code></a></h3>
-
-
-<p>
-Returns the values packed in string <code>s</code> (see <a href="#pdf-string.pack"><code>string.pack</code></a>)
-according to the format string <code>fmt</code> (see <a href="#6.4.2">&sect;6.4.2</a>).
-An optional <code>pos</code> marks where
-to start reading in <code>s</code> (default is 1).
-After the read values,
-this function also returns the index of the first unread byte in <code>s</code>.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-string.upper"><code>string.upper (s)</code></a></h3>
+<h4 <%=heading_options%> ><a name="String.to_binary"><tt>String.to_binary (s)</tt></a></h4>
+
+<p>
+Converts a string to a binary by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()"><tt>String.getBytes</tt></a>.
+
+
+
+<h4 <%=heading_options%> ><a name="String.to_number"><tt>String.to_number (s [, base])</tt></a></h4>
+
+<p>
+When called with no <tt>base</tt>,
+<tt>to_number</tt> tries to convert its argument to a number.
+If the argument is
+a string convertible to a number,
+then <tt>to_number</tt> returns this number;
+otherwise, it returns <b>nil</b>.
+
+The conversion of strings can result in integers or floats.
+
+
+<p>
+When called with <tt>base</tt>,
+then <tt>s</tt> must be a string to be interpreted as
+an integer numeral in that base.
+In bases above&nbsp;10, the letter '<tt>A</tt>' (in either upper or lower case)
+represents&nbsp;10, '<tt>B</tt>' represents&nbsp;11, and so forth,
+with '<tt>Z</tt>' representing 35.
+If the string <tt>s</tt> is not a valid numeral in the given base,
+the function returns <b>nil</b>.
+
+
+
+<h4 <%=heading_options%> ><a name="String.trim"><tt>String.trim (s)</tt></a></h4>
+
+<p>
+Removes the leading and trailing whitespace by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()"><tt>String.trim</tt></a>.
+
+
+
+
+<h4 <%=heading_options%> ><a name="String.unicode"><tt>String.unicode (s [, i [, j]])</tt></a></h4>
+
+<p>
+Returns the internal numerical codes of the characters <tt>s[i]</tt>,
+<tt>s[i+1]</tt>, ..., <tt>s[j]</tt>.
+The default value for <tt>i</tt> is&nbsp;1;
+the default value for <tt>j</tt> is&nbsp;<tt>i</tt>.
+These indices are corrected
+following the same rules of function <a href="#String.sub"><tt>String.sub</tt></a>.
+
+
+
+
+
+<h4 <%=heading_options%> ><a name="String.upper"><tt>String.upper (s)</tt></a></h4>
+<p>
 Receives a string and returns a copy of this string with all
 lowercase letters changed to uppercase.
 All other characters are left unchanged.
@@ -2681,404 +2655,6 @@
 
 
 
-<h3>6.4.1 &ndash; <a name="6.4.1">Patterns</a></h3>
-
-<p>
-Patterns in Lua are described by regular strings,
-which are interpreted as patterns by the pattern-matching functions
-<a href="#pdf-string.find"><code>string.find</code></a>,
-<a href="#pdf-string.gmatch"><code>string.gmatch</code></a>,
-<a href="#pdf-string.gsub"><code>string.gsub</code></a>,
-and <a href="#pdf-string.match"><code>string.match</code></a>.
-This section describes the syntax and the meaning
-(that is, what they match) of these strings.
-
-
-
-<h4>Character Class:</h4><p>
-A <em>character class</em> is used to represent a set of characters.
-The following combinations are allowed in describing a character class:
-
-<ul>
-
-<li><b><em>x</em>: </b>
-(where <em>x</em> is not one of the <em>magic characters</em>
-<code>^$()%.[]*+-?</code>)
-represents the character <em>x</em> itself.
-</li>
-
-<li><b><code>.</code>: </b> (a dot) represents all characters.</li>
-
-<li><b><code>%a</code>: </b> represents all letters.</li>
-
-<li><b><code>%c</code>: </b> represents all control characters.</li>
-
-<li><b><code>%d</code>: </b> represents all digits.</li>
-
-<li><b><code>%g</code>: </b> represents all printable characters except space.</li>
-
-<li><b><code>%l</code>: </b> represents all lowercase letters.</li>
-
-<li><b><code>%p</code>: </b> represents all punctuation characters.</li>
-
-<li><b><code>%s</code>: </b> represents all space characters.</li>
-
-<li><b><code>%u</code>: </b> represents all uppercase letters.</li>
-
-<li><b><code>%w</code>: </b> represents all alphanumeric characters.</li>
-
-<li><b><code>%x</code>: </b> represents all hexadecimal digits.</li>
-
-<li><b><code>%<em>x</em></code>: </b> (where <em>x</em> is any non-alphanumeric character)
-represents the character <em>x</em>.
-This is the standard way to escape the magic characters.
-Any non-alphanumeric character
-(including all punctuations, even the non-magical)
-can be preceded by a '<code>%</code>'
-when used to represent itself in a pattern.
-</li>
-
-<li><b><code>[<em>set</em>]</code>: </b>
-represents the class which is the union of all
-characters in <em>set</em>.
-A range of characters can be specified by
-separating the end characters of the range,
-in ascending order, with a '<code>-</code>'.
-All classes <code>%</code><em>x</em> described above can also be used as
-components in <em>set</em>.
-All other characters in <em>set</em> represent themselves.
-For example, <code>[%w_]</code> (or <code>[_%w]</code>)
-represents all alphanumeric characters plus the underscore,
-<code>[0-7]</code> represents the octal digits,
-and <code>[0-7%l%-]</code> represents the octal digits plus
-the lowercase letters plus the '<code>-</code>' character.
-
-
-<p>
-The interaction between ranges and classes is not defined.
-Therefore, patterns like <code>[%a-z]</code> or <code>[a-%%]</code>
-have no meaning.
-</li>
-
-<li><b><code>[^<em>set</em>]</code>: </b>
-represents the complement of <em>set</em>,
-where <em>set</em> is interpreted as above.
-</li>
-
-</ul><p>
-For all classes represented by single letters (<code>%a</code>, <code>%c</code>, etc.),
-the corresponding uppercase letter represents the complement of the class.
-For instance, <code>%S</code> represents all non-space characters.
-
-
-<p>
-The definitions of letter, space, and other character groups
-depend on the current locale.
-In particular, the class <code>[a-z]</code> may not be equivalent to <code>%l</code>.
-
-
-
-
-
-<h4>Pattern Item:</h4><p>
-A <em>pattern item</em> can be
-
-<ul>
-
-<li>
-a single character class,
-which matches any single character in the class;
-</li>
-
-<li>
-a single character class followed by '<code>*</code>',
-which matches zero or more repetitions of characters in the class.
-These repetition items will always match the longest possible sequence;
-</li>
-
-<li>
-a single character class followed by '<code>+</code>',
-which matches one or more repetitions of characters in the class.
-These repetition items will always match the longest possible sequence;
-</li>
-
-<li>
-a single character class followed by '<code>-</code>',
-which also matches zero or more repetitions of characters in the class.
-Unlike '<code>*</code>',
-these repetition items will always match the shortest possible sequence;
-</li>
-
-<li>
-a single character class followed by '<code>?</code>',
-which matches zero or one occurrence of a character in the class.
-It always matches one occurrence if possible;
-</li>
-
-<li>
-<code>%<em>n</em></code>, for <em>n</em> between 1 and 9;
-such item matches a substring equal to the <em>n</em>-th captured string
-(see below);
-</li>
-
-<li>
-<code>%b<em>xy</em></code>, where <em>x</em> and <em>y</em> are two distinct characters;
-such item matches strings that start with&nbsp;<em>x</em>, end with&nbsp;<em>y</em>,
-and where the <em>x</em> and <em>y</em> are <em>balanced</em>.
-This means that, if one reads the string from left to right,
-counting <em>+1</em> for an <em>x</em> and <em>-1</em> for a <em>y</em>,
-the ending <em>y</em> is the first <em>y</em> where the count reaches 0.
-For instance, the item <code>%b()</code> matches expressions with
-balanced parentheses.
-</li>
-
-<li>
-<code>%f[<em>set</em>]</code>, a <em>frontier pattern</em>;
-such item matches an empty string at any position such that
-the next character belongs to <em>set</em>
-and the previous character does not belong to <em>set</em>.
-The set <em>set</em> is interpreted as previously described.
-The beginning and the end of the subject are handled as if
-they were the character '<code>\0</code>'.
-</li>
-
-</ul>
-
-
-
-
-<h4>Pattern:</h4><p>
-A <em>pattern</em> is a sequence of pattern items.
-A caret '<code>^</code>' at the beginning of a pattern anchors the match at the
-beginning of the subject string.
-A '<code>$</code>' at the end of a pattern anchors the match at the
-end of the subject string.
-At other positions,
-'<code>^</code>' and '<code>$</code>' have no special meaning and represent themselves.
-
-
-
-
-
-<h4>Captures:</h4><p>
-A pattern can contain sub-patterns enclosed in parentheses;
-they describe <em>captures</em>.
-When a match succeeds, the substrings of the subject string
-that match captures are stored (<em>captured</em>) for future use.
-Captures are numbered according to their left parentheses.
-For instance, in the pattern <code>"(a*(.)%w(%s*))"</code>,
-the part of the string matching <code>"a*(.)%w(%s*)"</code> is
-stored as the first capture (and therefore has number&nbsp;1);
-the character matching "<code>.</code>" is captured with number&nbsp;2,
-and the part matching "<code>%s*</code>" has number&nbsp;3.
-
-
-<p>
-As a special case, the empty capture <code>()</code> captures
-the current string position (a number).
-For instance, if we apply the pattern <code>"()aa()"</code> on the
-string <code>"flaaap"</code>, there will be two captures: 3&nbsp;and&nbsp;5.
-
-
-
-
-
-
-
-<h3>6.4.2 &ndash; <a name="6.4.2">Format Strings for Pack and Unpack</a></h3>
-
-<p>
-The first argument to <a href="#pdf-string.pack"><code>string.pack</code></a>,
-<a href="#pdf-string.packsize"><code>string.packsize</code></a>, and <a href="#pdf-string.unpack"><code>string.unpack</code></a>
-is a format string,
-which describes the layout of the structure being created or read.
-
-
-<p>
-A format string is a sequence of conversion options.
-The conversion options are as follows:
-
-<ul>
-<li><b><code>&lt;</code>: </b>sets little endian</li>
-<li><b><code>&gt;</code>: </b>sets big endian</li>
-<li><b><code>=</code>: </b>sets native endian</li>
-<li><b><code>![<em>n</em>]</code>: </b>sets maximum alignment to <code>n</code>
-(default is native alignment)</li>
-<li><b><code>b</code>: </b>a signed byte (<code>char</code>)</li>
-<li><b><code>B</code>: </b>an unsigned byte (<code>char</code>)</li>
-<li><b><code>h</code>: </b>a signed <code>short</code> (native size)</li>
-<li><b><code>H</code>: </b>an unsigned <code>short</code> (native size)</li>
-<li><b><code>l</code>: </b>a signed <code>long</code> (native size)</li>
-<li><b><code>L</code>: </b>an unsigned <code>long</code> (native size)</li>
-<li><b><code>j</code>: </b>a <code>lua_Integer</code></li>
-<li><b><code>J</code>: </b>a <code>lua_Unsigned</code></li>
-<li><b><code>T</code>: </b>a <code>size_t</code> (native size)</li>
-<li><b><code>i[<em>n</em>]</code>: </b>a signed <code>int</code> with <code>n</code> bytes
-(default is native size)</li>
-<li><b><code>I[<em>n</em>]</code>: </b>an unsigned <code>int</code> with <code>n</code> bytes
-(default is native size)</li>
-<li><b><code>f</code>: </b>a <code>float</code> (native size)</li>
-<li><b><code>d</code>: </b>a <code>double</code> (native size)</li>
-<li><b><code>n</code>: </b>a <code>lua_Number</code></li>
-<li><b><code>c<em>n</em></code>: </b>a fixed-sized string with <code>n</code> bytes</li>
-<li><b><code>z</code>: </b>a zero-terminated string</li>
-<li><b><code>s[<em>n</em>]</code>: </b>a string preceded by its length
-coded as an unsigned integer with <code>n</code> bytes
-(default is a <code>size_t</code>)</li>
-<li><b><code>x</code>: </b>one byte of padding</li>
-<li><b><code>X<em>op</em></code>: </b>an empty item that aligns
-according to option <code>op</code>
-(which is otherwise ignored)</li>
-<li><b>'<code> </code>': </b>(empty space) ignored</li>
-</ul><p>
-(A "<code>[<em>n</em>]</code>" means an optional integral numeral.)
-Except for padding, spaces, and configurations
-(options "<code>xX &lt;=&gt;!</code>"),
-each option corresponds to an argument (in <a href="#pdf-string.pack"><code>string.pack</code></a>)
-or a result (in <a href="#pdf-string.unpack"><code>string.unpack</code></a>).
-
-
-<p>
-For options "<code>!<em>n</em></code>", "<code>s<em>n</em></code>", "<code>i<em>n</em></code>", and "<code>I<em>n</em></code>",
-<code>n</code> can be any integer between 1 and 16.
-All integral options check overflows;
-<a href="#pdf-string.pack"><code>string.pack</code></a> checks whether the given value fits in the given size;
-<a href="#pdf-string.unpack"><code>string.unpack</code></a> checks whether the read value fits in a Lua integer.
-
-
-<p>
-Any format string starts as if prefixed by "<code>!1=</code>",
-that is,
-with maximum alignment of 1 (no alignment)
-and native endianness.
-
-
-<p>
-Alignment works as follows:
-For each option,
-the format gets extra padding until the data starts
-at an offset that is a multiple of the minimum between the
-option size and the maximum alignment;
-this minimum must be a power of 2.
-Options "<code>c</code>" and "<code>z</code>" are not aligned;
-option "<code>s</code>" follows the alignment of its starting integer.
-
-
-<p>
-All padding is filled with zeros by <a href="#pdf-string.pack"><code>string.pack</code></a>
-(and ignored by <a href="#pdf-string.unpack"><code>string.unpack</code></a>).
-
-
-
-
-
-
-
-<h2>6.5 &ndash; <a name="6.5">UTF-8 Support</a></h2>
-
-<p>
-This library provides basic support for UTF-8 encoding.
-It provides all its functions inside the table <a name="pdf-utf8"><code>utf8</code></a>.
-This library does not provide any support for Unicode other
-than the handling of the encoding.
-Any operation that needs the meaning of a character,
-such as character classification, is outside its scope.
-
-
-<p>
-Unless stated otherwise,
-all functions that expect a byte position as a parameter
-assume that the given position is either the start of a byte sequence
-or one plus the length of the subject string.
-As in the string library,
-negative indices count from the end of the string.
-
-
-<p>
-<hr><h3><a name="pdf-utf8.char"><code>utf8.char (&middot;&middot;&middot;)</code></a></h3>
-Receives zero or more integers,
-converts each one to its corresponding UTF-8 byte sequence
-and returns a string with the concatenation of all these sequences.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-utf8.charpattern"><code>utf8.charpattern</code></a></h3>
-The pattern (a string, not a function) "<code>[\0-\x7F\xC2-\xF4][\x80-\xBF]*</code>"
-(see <a href="#6.4.1">&sect;6.4.1</a>),
-which matches exactly one UTF-8 byte sequence,
-assuming that the subject is a valid UTF-8 string.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-utf8.codes"><code>utf8.codes (s)</code></a></h3>
-
-
-<p>
-Returns values so that the construction
-
-<pre>
-     for p, c in utf8.codes(s) do <em>body</em> end
-</pre><p>
-will iterate over all characters in string <code>s</code>,
-with <code>p</code> being the position (in bytes) and <code>c</code> the code point
-of each character.
-It raises an error if it meets any invalid byte sequence.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-utf8.codepoint"><code>utf8.codepoint (s [, i [, j]])</code></a></h3>
-Returns the codepoints (as integers) from all characters in <code>s</code>
-that start between byte position <code>i</code> and <code>j</code> (both included).
-The default for <code>i</code> is 1 and for <code>j</code> is <code>i</code>.
-It raises an error if it meets any invalid byte sequence.
-
-
-
-
-<p>
-<hr><h3><a name="pdf-utf8.len"><code>utf8.len (s [, i [, j]])</code></a></h3>
-Returns the number of UTF-8 characters in string <code>s</code>
-that start between positions <code>i</code> and <code>j</code> (both inclusive).
-The default for <code>i</code> is 1 and for <code>j</code> is -1.
-If it finds any invalid byte sequence,
-returns a false value plus the position of the first invalid byte. 
-
-
-
-
-<p>
-<hr><h3><a name="pdf-utf8.offset"><code>utf8.offset (s, n [, i])</code></a></h3>
-Returns the position (in bytes) where the encoding of the
-<code>n</code>-th character of <code>s</code>
-(counting from position <code>i</code>) starts.
-A negative <code>n</code> gets characters before position <code>i</code>.
-The default for <code>i</code> is 1 when <code>n</code> is non-negative
-and <code>#s + 1</code> otherwise,
-so that <code>utf8.offset(s, -n)</code> gets the offset of the
-<code>n</code>-th character from the end of the string.
-If the specified character is neither in the subject
-nor right after its end,
-the function returns <b>nil</b>.
-
-
-<p>
-As a special case,
-when <code>n</code> is 0 the function returns the start of the encoding
-of the character that contains the <code>i</code>-th byte of <code>s</code>.
-
-
-<p>
-This function assumes that <code>s</code> is a valid UTF-8 string.
-
-
-
-