comparison website/src/manual.html.luan @ 556:d02f43598ba3

finish String documentation
author Franklin Schmidt <fschmidt@gmail.com>
date Fri, 19 Jun 2015 19:39:41 -0600
parents e25ba7a2e816
children 7cc9d4a53d3b
comparison
equal deleted inserted replaced
555:e25ba7a2e816 556:d02f43598ba3
2157 If the original metatable has a <tt>"__metatable"</tt> field, 2157 If the original metatable has a <tt>"__metatable"</tt> field,
2158 raises an error. 2158 raises an error.
2159 2159
2160 2160
2161 2161
2162
2163 <p>
2164 <hr><h3><a name="pdf-tonumber"><code>tonumber (e [, base])</code></a></h3>
2165
2166
2167 <p>
2168 When called with no <code>base</code>,
2169 <code>tonumber</code> tries to convert its argument to a number.
2170 If the argument is already a number or
2171 a string convertible to a number,
2172 then <code>tonumber</code> returns this number;
2173 otherwise, it returns <b>nil</b>.
2174
2175
2176 <p>
2177 The conversion of strings can result in integers or floats,
2178 according to the lexical conventions of Lua (see <a href="#3.1">&sect;3.1</a>).
2179 (The string may have leading and trailing spaces and a sign.)
2180
2181
2182 <p>
2183 When called with <code>base</code>,
2184 then <code>e</code> must be a string to be interpreted as
2185 an integer numeral in that base.
2186 The base may be any integer between 2 and 36, inclusive.
2187 In bases above&nbsp;10, the letter '<code>A</code>' (in either upper or lower case)
2188 represents&nbsp;10, '<code>B</code>' represents&nbsp;11, and so forth,
2189 with '<code>Z</code>' representing 35.
2190 If the string <code>e</code> is not a valid numeral in the given base,
2191 the function returns <b>nil</b>.
2192
2193
2194
2195
2196 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4> 2162 <h4 <%=heading_options%> ><a name="Luan.to_string"><tt>Luan.to_string (v)</tt></a></h4>
2197 2163
2198 <p> 2164 <p>
2199 Receives a value of any type and 2165 Receives a value of any type and
2200 converts it to a string in a human-readable format. 2166 converts it to a string in a human-readable format.
2365 from the end of the string. 2331 from the end of the string.
2366 Thus, the last character is at position -1, and so on. 2332 Thus, the last character is at position -1, and so on.
2367 2333
2368 2334
2369 2335
2370
2371 <p>
2372 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3>
2373 Returns the internal numerical codes of the characters <code>s[i]</code>,
2374 <code>s[i+1]</code>, ..., <code>s[j]</code>.
2375 The default value for <code>i</code> is&nbsp;1;
2376 the default value for <code>j</code> is&nbsp;<code>i</code>.
2377 These indices are corrected
2378 following the same rules of function <a href="#pdf-string.sub"><code>string.sub</code></a>.
2379
2380
2381 <p>
2382 Numerical codes are not necessarily portable across platforms.
2383
2384
2385
2386 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (&middot;&middot;&middot;)</tt></a></h4> 2336 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (&middot;&middot;&middot;)</tt></a></h4>
2387 2337
2388 <p> 2338 <p>
2389 Receives zero or more integers. 2339 Receives zero or more integers.
2390 Returns a string with length equal to the number of arguments, 2340 Returns a string with length equal to the number of arguments,
2409 2359
2410 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4> 2360 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4>
2411 2361
2412 <p> 2362 <p>
2413 Looks for the first match of 2363 Looks for the first match of
2414 <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <tt>s</tt>. 2364 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>.
2415 If it finds a match, then <tt>find</tt> returns the indices of&nbsp;<tt>s</tt> 2365 If it finds a match, then <tt>find</tt> returns the indices of&nbsp;<tt>s</tt>
2416 where this occurrence starts and ends; 2366 where this occurrence starts and ends;
2417 otherwise, it returns <b>nil</b>. 2367 otherwise, it returns <b>nil</b>.
2418 A third, optional numerical argument <tt>init</tt> specifies 2368 A third, optional numerical argument <tt>init</tt> specifies
2419 where to start the search; 2369 where to start the search;
2449 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4> 2399 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4>
2450 2400
2451 <p> 2401 <p>
2452 Returns an iterator function that, 2402 Returns an iterator function that,
2453 each time it is called, 2403 each time it is called,
2454 returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) 2404 returns the next captures from <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>)
2455 over the string <tt>s</tt>. 2405 over the string <tt>s</tt>.
2456 If <tt>pattern</tt> specifies no captures, 2406 If <tt>pattern</tt> specifies no captures,
2457 then the whole match is produced in each call. 2407 then the whole match is produced in each call.
2458 2408
2459 2409
2490 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4> 2440 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4>
2491 2441
2492 <p> 2442 <p>
2493 Returns a copy of <tt>s</tt> 2443 Returns a copy of <tt>s</tt>
2494 in which all (or the first <tt>n</tt>, if given) 2444 in which all (or the first <tt>n</tt>, if given)
2495 occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) have been 2445 occurrences of the <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) have been
2496 replaced by a replacement string specified by <tt>repl</tt>, 2446 replaced by a replacement string specified by <tt>repl</tt>,
2497 which can be a string, a table, or a function. 2447 which can be a string, a table, or a function.
2498 <tt>gsub</tt> also returns, as its second value, 2448 <tt>gsub</tt> also returns, as its second value,
2499 the total number of matches that occurred. 2449 the total number of matches that occurred.
2500 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>. 2450 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>.
2558 --&gt; x="lua-5.3.tar.gz" 2508 --&gt; x="lua-5.3.tar.gz"
2559 </pre></tt></p> 2509 </pre></tt></p>
2560 2510
2561 2511
2562 2512
2513 <h4 <%=heading_options%> ><a name="String.literal"><tt>String.literal (s)</tt></a></h4>
2514 <p>
2515 Returns a string which matches the literal string <tt>s</tt> in a regular expression. This function is simply the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#quote(java.lang.String)"><tt>Pattern.quote</tt></a>.
2516
2517
2563 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4> 2518 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4>
2564 <p> 2519 <p>
2565 Receives a string and returns a copy of this string with all 2520 Receives a string and returns a copy of this string with all
2566 uppercase letters changed to lowercase. 2521 uppercase letters changed to lowercase.
2567 All other characters are left unchanged. 2522 All other characters are left unchanged.
2568 2523
2569 2524
2570 2525
2571 2526
2572 <p> 2527 <h4 <%=heading_options%> ><a name="String.match"><tt>String.match (s, pattern [, init])</tt></a></h4>
2573 <hr><h3><a name="pdf-string.match"><code>string.match (s, pattern [, init])</code></a></h3> 2528
2574 Looks for the first <em>match</em> of 2529 <p>
2575 <code>pattern</code> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <code>s</code>. 2530 Looks for the first <i>match</i> of
2576 If it finds one, then <code>match</code> returns 2531 <tt>pattern</tt> (see <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Pattern</a>) in the string <tt>s</tt>.
2532 If it finds one, then <tt>match</tt> returns
2577 the captures from the pattern; 2533 the captures from the pattern;
2578 otherwise it returns <b>nil</b>. 2534 otherwise it returns <b>nil</b>.
2579 If <code>pattern</code> specifies no captures, 2535 If <tt>pattern</tt> specifies no captures,
2580 then the whole match is returned. 2536 then the whole match is returned.
2581 A third, optional numerical argument <code>init</code> specifies 2537 A third, optional numerical argument <tt>init</tt> specifies
2582 where to start the search; 2538 where to start the search;
2583 its default value is&nbsp;1 and can be negative. 2539 its default value is&nbsp;1 and can be negative.
2584 2540
2585 2541
2586 2542 <h4 <%=heading_options%> ><a name="String.matches"><tt>String.matches (s, pattern)</tt></a></h4>
2587 2543 <p>
2588 <p> 2544 Returns a boolean indicating whether the entire string <tt>s</tt> matches <tt>pattern</tt>.
2589 <hr><h3><a name="pdf-string.pack"><code>string.pack (fmt, v1, v2, &middot;&middot;&middot;)</code></a></h3> 2545
2590 2546
2591 2547
2592 <p> 2548 <h4 <%=heading_options%> ><a name="String.rep"><tt>String.rep (s, n [, sep])</tt></a></h4>
2593 Returns a binary string containing the values <code>v1</code>, <code>v2</code>, etc. 2549 <p>
2594 packed (that is, serialized in binary form) 2550 Returns a string that is the concatenation of <tt>n</tt> copies of
2595 according to the format string <code>fmt</code> (see <a href="#6.4.2">&sect;6.4.2</a>). 2551 the string <tt>s</tt> separated by the string <tt>sep</tt>.
2596 2552 The default value for <tt>sep</tt> is the empty string
2597
2598
2599
2600 <p>
2601 <hr><h3><a name="pdf-string.packsize"><code>string.packsize (fmt)</code></a></h3>
2602
2603
2604 <p>
2605 Returns the size of a string resulting from <a href="#pdf-string.pack"><code>string.pack</code></a>
2606 with the given format.
2607 The format string cannot have the variable-length options
2608 '<code>s</code>' or '<code>z</code>' (see <a href="#6.4.2">&sect;6.4.2</a>).
2609
2610
2611
2612
2613 <p>
2614 <hr><h3><a name="pdf-string.rep"><code>string.rep (s, n [, sep])</code></a></h3>
2615 Returns a string that is the concatenation of <code>n</code> copies of
2616 the string <code>s</code> separated by the string <code>sep</code>.
2617 The default value for <code>sep</code> is the empty string
2618 (that is, no separator). 2553 (that is, no separator).
2619 Returns the empty string if <code>n</code> is not positive. 2554 Returns the empty string if <tt>n</tt> is not positive.
2620 2555
2621 2556
2622 2557
2623 2558
2624 <p> 2559 <h4 <%=heading_options%> ><a name="String.reverse"><tt>String.reverse (s)</tt></a></h4>
2625 <hr><h3><a name="pdf-string.reverse"><code>string.reverse (s)</code></a></h3> 2560 <p>
2626 Returns a string that is the string <code>s</code> reversed. 2561 Returns a string that is the string <tt>s</tt> reversed.
2627 2562
2628 2563
2629 2564
2630 2565
2631 <p> 2566 <h4 <%=heading_options%> ><a name="String.sub"><tt>String.sub (s, i [, j])</tt></a></h4>
2632 <hr><h3><a name="pdf-string.sub"><code>string.sub (s, i [, j])</code></a></h3> 2567
2633 Returns the substring of <code>s</code> that 2568 <p>
2634 starts at <code>i</code> and continues until <code>j</code>; 2569 Returns the substring of <tt>s</tt> that
2635 <code>i</code> and <code>j</code> can be negative. 2570 starts at <tt>i</tt> and continues until <tt>j</tt>;
2636 If <code>j</code> is absent, then it is assumed to be equal to -1 2571 <tt>i</tt> and <tt>j</tt> can be negative.
2572 If <tt>j</tt> is absent, then it is assumed to be equal to -1
2637 (which is the same as the string length). 2573 (which is the same as the string length).
2638 In particular, 2574 In particular,
2639 the call <code>string.sub(s,1,j)</code> returns a prefix of <code>s</code> 2575 the call <tt>string.sub(s,1,j)</tt> returns a prefix of <tt>s</tt>
2640 with length <code>j</code>, 2576 with length <tt>j</tt>,
2641 and <code>string.sub(s, -i)</code> returns a suffix of <code>s</code> 2577 and <tt>string.sub(s, -i)</tt> returns a suffix of <tt>s</tt>
2642 with length <code>i</code>. 2578 with length <tt>i</tt>.
2643 2579
2644 2580
2645 <p> 2581 <p>
2646 If, after the translation of negative indices, 2582 If, after the translation of negative indices,
2647 <code>i</code> is less than 1, 2583 <tt>i</tt> is less than 1,
2648 it is corrected to 1. 2584 it is corrected to 1.
2649 If <code>j</code> is greater than the string length, 2585 If <tt>j</tt> is greater than the string length,
2650 it is corrected to that length. 2586 it is corrected to that length.
2651 If, after these corrections, 2587 If, after these corrections,
2652 <code>i</code> is greater than <code>j</code>, 2588 <tt>i</tt> is greater than <tt>j</tt>,
2653 the function returns the empty string. 2589 the function returns the empty string.
2654 2590
2655 2591
2656 2592
2657 2593 <h4 <%=heading_options%> ><a name="String.to_binary"><tt>String.to_binary (s)</tt></a></h4>
2658 <p> 2594
2659 <hr><h3><a name="pdf-string.unpack"><code>string.unpack (fmt, s [, pos])</code></a></h3> 2595 <p>
2660 2596 Converts a string to a binary by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#getBytes()"><tt>String.getBytes</tt></a>.
2661 2597
2662 <p> 2598
2663 Returns the values packed in string <code>s</code> (see <a href="#pdf-string.pack"><code>string.pack</code></a>) 2599
2664 according to the format string <code>fmt</code> (see <a href="#6.4.2">&sect;6.4.2</a>). 2600 <h4 <%=heading_options%> ><a name="String.to_number"><tt>String.to_number (s [, base])</tt></a></h4>
2665 An optional <code>pos</code> marks where 2601
2666 to start reading in <code>s</code> (default is 1). 2602 <p>
2667 After the read values, 2603 When called with no <tt>base</tt>,
2668 this function also returns the index of the first unread byte in <code>s</code>. 2604 <tt>to_number</tt> tries to convert its argument to a number.
2669 2605 If the argument is
2670 2606 a string convertible to a number,
2671 2607 then <tt>to_number</tt> returns this number;
2672 2608 otherwise, it returns <b>nil</b>.
2673 <p> 2609
2674 <hr><h3><a name="pdf-string.upper"><code>string.upper (s)</code></a></h3> 2610 The conversion of strings can result in integers or floats.
2611
2612
2613 <p>
2614 When called with <tt>base</tt>,
2615 then <tt>s</tt> must be a string to be interpreted as
2616 an integer numeral in that base.
2617 In bases above&nbsp;10, the letter '<tt>A</tt>' (in either upper or lower case)
2618 represents&nbsp;10, '<tt>B</tt>' represents&nbsp;11, and so forth,
2619 with '<tt>Z</tt>' representing 35.
2620 If the string <tt>s</tt> is not a valid numeral in the given base,
2621 the function returns <b>nil</b>.
2622
2623
2624
2625 <h4 <%=heading_options%> ><a name="String.trim"><tt>String.trim (s)</tt></a></h4>
2626
2627 <p>
2628 Removes the leading and trailing whitespace by calling the Java method <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#trim()"><tt>String.trim</tt></a>.
2629
2630
2631
2632
2633 <h4 <%=heading_options%> ><a name="String.unicode"><tt>String.unicode (s [, i [, j]])</tt></a></h4>
2634
2635 <p>
2636 Returns the internal numerical codes of the characters <tt>s[i]</tt>,
2637 <tt>s[i+1]</tt>, ..., <tt>s[j]</tt>.
2638 The default value for <tt>i</tt> is&nbsp;1;
2639 the default value for <tt>j</tt> is&nbsp;<tt>i</tt>.
2640 These indices are corrected
2641 following the same rules of function <a href="#String.sub"><tt>String.sub</tt></a>.
2642
2643
2644
2645
2646
2647 <h4 <%=heading_options%> ><a name="String.upper"><tt>String.upper (s)</tt></a></h4>
2648 <p>
2675 Receives a string and returns a copy of this string with all 2649 Receives a string and returns a copy of this string with all
2676 lowercase letters changed to uppercase. 2650 lowercase letters changed to uppercase.
2677 All other characters are left unchanged. 2651 All other characters are left unchanged.
2678 The definition of what a lowercase letter is depends on the current locale. 2652 The definition of what a lowercase letter is depends on the current locale.
2679 2653
2680
2681
2682
2683
2684 <h3>6.4.1 &ndash; <a name="6.4.1">Patterns</a></h3>
2685
2686 <p>
2687 Patterns in Lua are described by regular strings,
2688 which are interpreted as patterns by the pattern-matching functions
2689 <a href="#pdf-string.find"><code>string.find</code></a>,
2690 <a href="#pdf-string.gmatch"><code>string.gmatch</code></a>,
2691 <a href="#pdf-string.gsub"><code>string.gsub</code></a>,
2692 and <a href="#pdf-string.match"><code>string.match</code></a>.
2693 This section describes the syntax and the meaning
2694 (that is, what they match) of these strings.
2695
2696
2697
2698 <h4>Character Class:</h4><p>
2699 A <em>character class</em> is used to represent a set of characters.
2700 The following combinations are allowed in describing a character class:
2701
2702 <ul>
2703
2704 <li><b><em>x</em>: </b>
2705 (where <em>x</em> is not one of the <em>magic characters</em>
2706 <code>^$()%.[]*+-?</code>)
2707 represents the character <em>x</em> itself.
2708 </li>
2709
2710 <li><b><code>.</code>: </b> (a dot) represents all characters.</li>
2711
2712 <li><b><code>%a</code>: </b> represents all letters.</li>
2713
2714 <li><b><code>%c</code>: </b> represents all control characters.</li>
2715
2716 <li><b><code>%d</code>: </b> represents all digits.</li>
2717
2718 <li><b><code>%g</code>: </b> represents all printable characters except space.</li>
2719
2720 <li><b><code>%l</code>: </b> represents all lowercase letters.</li>
2721
2722 <li><b><code>%p</code>: </b> represents all punctuation characters.</li>
2723
2724 <li><b><code>%s</code>: </b> represents all space characters.</li>
2725
2726 <li><b><code>%u</code>: </b> represents all uppercase letters.</li>
2727
2728 <li><b><code>%w</code>: </b> represents all alphanumeric characters.</li>
2729
2730 <li><b><code>%x</code>: </b> represents all hexadecimal digits.</li>
2731
2732 <li><b><code>%<em>x</em></code>: </b> (where <em>x</em> is any non-alphanumeric character)
2733 represents the character <em>x</em>.
2734 This is the standard way to escape the magic characters.
2735 Any non-alphanumeric character
2736 (including all punctuations, even the non-magical)
2737 can be preceded by a '<code>%</code>'
2738 when used to represent itself in a pattern.
2739 </li>
2740
2741 <li><b><code>[<em>set</em>]</code>: </b>
2742 represents the class which is the union of all
2743 characters in <em>set</em>.
2744 A range of characters can be specified by
2745 separating the end characters of the range,
2746 in ascending order, with a '<code>-</code>'.
2747 All classes <code>%</code><em>x</em> described above can also be used as
2748 components in <em>set</em>.
2749 All other characters in <em>set</em> represent themselves.
2750 For example, <code>[%w_]</code> (or <code>[_%w]</code>)
2751 represents all alphanumeric characters plus the underscore,
2752 <code>[0-7]</code> represents the octal digits,
2753 and <code>[0-7%l%-]</code> represents the octal digits plus
2754 the lowercase letters plus the '<code>-</code>' character.
2755
2756
2757 <p>
2758 The interaction between ranges and classes is not defined.
2759 Therefore, patterns like <code>[%a-z]</code> or <code>[a-%%]</code>
2760 have no meaning.
2761 </li>
2762
2763 <li><b><code>[^<em>set</em>]</code>: </b>
2764 represents the complement of <em>set</em>,
2765 where <em>set</em> is interpreted as above.
2766 </li>
2767
2768 </ul><p>
2769 For all classes represented by single letters (<code>%a</code>, <code>%c</code>, etc.),
2770 the corresponding uppercase letter represents the complement of the class.
2771 For instance, <code>%S</code> represents all non-space characters.
2772
2773
2774 <p>
2775 The definitions of letter, space, and other character groups
2776 depend on the current locale.
2777 In particular, the class <code>[a-z]</code> may not be equivalent to <code>%l</code>.
2778
2779
2780
2781
2782
2783 <h4>Pattern Item:</h4><p>
2784 A <em>pattern item</em> can be
2785
2786 <ul>
2787
2788 <li>
2789 a single character class,
2790 which matches any single character in the class;
2791 </li>
2792
2793 <li>
2794 a single character class followed by '<code>*</code>',
2795 which matches zero or more repetitions of characters in the class.
2796 These repetition items will always match the longest possible sequence;
2797 </li>
2798
2799 <li>
2800 a single character class followed by '<code>+</code>',
2801 which matches one or more repetitions of characters in the class.
2802 These repetition items will always match the longest possible sequence;
2803 </li>
2804
2805 <li>
2806 a single character class followed by '<code>-</code>',
2807 which also matches zero or more repetitions of characters in the class.
2808 Unlike '<code>*</code>',
2809 these repetition items will always match the shortest possible sequence;
2810 </li>
2811
2812 <li>
2813 a single character class followed by '<code>?</code>',
2814 which matches zero or one occurrence of a character in the class.
2815 It always matches one occurrence if possible;
2816 </li>
2817
2818 <li>
2819 <code>%<em>n</em></code>, for <em>n</em> between 1 and 9;
2820 such item matches a substring equal to the <em>n</em>-th captured string
2821 (see below);
2822 </li>
2823
2824 <li>
2825 <code>%b<em>xy</em></code>, where <em>x</em> and <em>y</em> are two distinct characters;
2826 such item matches strings that start with&nbsp;<em>x</em>, end with&nbsp;<em>y</em>,
2827 and where the <em>x</em> and <em>y</em> are <em>balanced</em>.
2828 This means that, if one reads the string from left to right,
2829 counting <em>+1</em> for an <em>x</em> and <em>-1</em> for a <em>y</em>,
2830 the ending <em>y</em> is the first <em>y</em> where the count reaches 0.
2831 For instance, the item <code>%b()</code> matches expressions with
2832 balanced parentheses.
2833 </li>
2834
2835 <li>
2836 <code>%f[<em>set</em>]</code>, a <em>frontier pattern</em>;
2837 such item matches an empty string at any position such that
2838 the next character belongs to <em>set</em>
2839 and the previous character does not belong to <em>set</em>.
2840 The set <em>set</em> is interpreted as previously described.
2841 The beginning and the end of the subject are handled as if
2842 they were the character '<code>\0</code>'.
2843 </li>
2844
2845 </ul>
2846
2847
2848
2849
2850 <h4>Pattern:</h4><p>
2851 A <em>pattern</em> is a sequence of pattern items.
2852 A caret '<code>^</code>' at the beginning of a pattern anchors the match at the
2853 beginning of the subject string.
2854 A '<code>$</code>' at the end of a pattern anchors the match at the
2855 end of the subject string.
2856 At other positions,
2857 '<code>^</code>' and '<code>$</code>' have no special meaning and represent themselves.
2858
2859
2860
2861
2862
2863 <h4>Captures:</h4><p>
2864 A pattern can contain sub-patterns enclosed in parentheses;
2865 they describe <em>captures</em>.
2866 When a match succeeds, the substrings of the subject string
2867 that match captures are stored (<em>captured</em>) for future use.
2868 Captures are numbered according to their left parentheses.
2869 For instance, in the pattern <code>"(a*(.)%w(%s*))"</code>,
2870 the part of the string matching <code>"a*(.)%w(%s*)"</code> is
2871 stored as the first capture (and therefore has number&nbsp;1);
2872 the character matching "<code>.</code>" is captured with number&nbsp;2,
2873 and the part matching "<code>%s*</code>" has number&nbsp;3.
2874
2875
2876 <p>
2877 As a special case, the empty capture <code>()</code> captures
2878 the current string position (a number).
2879 For instance, if we apply the pattern <code>"()aa()"</code> on the
2880 string <code>"flaaap"</code>, there will be two captures: 3&nbsp;and&nbsp;5.
2881
2882
2883
2884
2885
2886
2887
2888 <h3>6.4.2 &ndash; <a name="6.4.2">Format Strings for Pack and Unpack</a></h3>
2889
2890 <p>
2891 The first argument to <a href="#pdf-string.pack"><code>string.pack</code></a>,
2892 <a href="#pdf-string.packsize"><code>string.packsize</code></a>, and <a href="#pdf-string.unpack"><code>string.unpack</code></a>
2893 is a format string,
2894 which describes the layout of the structure being created or read.
2895
2896
2897 <p>
2898 A format string is a sequence of conversion options.
2899 The conversion options are as follows:
2900
2901 <ul>
2902 <li><b><code>&lt;</code>: </b>sets little endian</li>
2903 <li><b><code>&gt;</code>: </b>sets big endian</li>
2904 <li><b><code>=</code>: </b>sets native endian</li>
2905 <li><b><code>![<em>n</em>]</code>: </b>sets maximum alignment to <code>n</code>
2906 (default is native alignment)</li>
2907 <li><b><code>b</code>: </b>a signed byte (<code>char</code>)</li>
2908 <li><b><code>B</code>: </b>an unsigned byte (<code>char</code>)</li>
2909 <li><b><code>h</code>: </b>a signed <code>short</code> (native size)</li>
2910 <li><b><code>H</code>: </b>an unsigned <code>short</code> (native size)</li>
2911 <li><b><code>l</code>: </b>a signed <code>long</code> (native size)</li>
2912 <li><b><code>L</code>: </b>an unsigned <code>long</code> (native size)</li>
2913 <li><b><code>j</code>: </b>a <code>lua_Integer</code></li>
2914 <li><b><code>J</code>: </b>a <code>lua_Unsigned</code></li>
2915 <li><b><code>T</code>: </b>a <code>size_t</code> (native size)</li>
2916 <li><b><code>i[<em>n</em>]</code>: </b>a signed <code>int</code> with <code>n</code> bytes
2917 (default is native size)</li>
2918 <li><b><code>I[<em>n</em>]</code>: </b>an unsigned <code>int</code> with <code>n</code> bytes
2919 (default is native size)</li>
2920 <li><b><code>f</code>: </b>a <code>float</code> (native size)</li>
2921 <li><b><code>d</code>: </b>a <code>double</code> (native size)</li>
2922 <li><b><code>n</code>: </b>a <code>lua_Number</code></li>
2923 <li><b><code>c<em>n</em></code>: </b>a fixed-sized string with <code>n</code> bytes</li>
2924 <li><b><code>z</code>: </b>a zero-terminated string</li>
2925 <li><b><code>s[<em>n</em>]</code>: </b>a string preceded by its length
2926 coded as an unsigned integer with <code>n</code> bytes
2927 (default is a <code>size_t</code>)</li>
2928 <li><b><code>x</code>: </b>one byte of padding</li>
2929 <li><b><code>X<em>op</em></code>: </b>an empty item that aligns
2930 according to option <code>op</code>
2931 (which is otherwise ignored)</li>
2932 <li><b>'<code> </code>': </b>(empty space) ignored</li>
2933 </ul><p>
2934 (A "<code>[<em>n</em>]</code>" means an optional integral numeral.)
2935 Except for padding, spaces, and configurations
2936 (options "<code>xX &lt;=&gt;!</code>"),
2937 each option corresponds to an argument (in <a href="#pdf-string.pack"><code>string.pack</code></a>)
2938 or a result (in <a href="#pdf-string.unpack"><code>string.unpack</code></a>).
2939
2940
2941 <p>
2942 For options "<code>!<em>n</em></code>", "<code>s<em>n</em></code>", "<code>i<em>n</em></code>", and "<code>I<em>n</em></code>",
2943 <code>n</code> can be any integer between 1 and 16.
2944 All integral options check overflows;
2945 <a href="#pdf-string.pack"><code>string.pack</code></a> checks whether the given value fits in the given size;
2946 <a href="#pdf-string.unpack"><code>string.unpack</code></a> checks whether the read value fits in a Lua integer.
2947
2948
2949 <p>
2950 Any format string starts as if prefixed by "<code>!1=</code>",
2951 that is,
2952 with maximum alignment of 1 (no alignment)
2953 and native endianness.
2954
2955
2956 <p>
2957 Alignment works as follows:
2958 For each option,
2959 the format gets extra padding until the data starts
2960 at an offset that is a multiple of the minimum between the
2961 option size and the maximum alignment;
2962 this minimum must be a power of 2.
2963 Options "<code>c</code>" and "<code>z</code>" are not aligned;
2964 option "<code>s</code>" follows the alignment of its starting integer.
2965
2966
2967 <p>
2968 All padding is filled with zeros by <a href="#pdf-string.pack"><code>string.pack</code></a>
2969 (and ignored by <a href="#pdf-string.unpack"><code>string.unpack</code></a>).
2970
2971
2972
2973
2974
2975
2976
2977 <h2>6.5 &ndash; <a name="6.5">UTF-8 Support</a></h2>
2978
2979 <p>
2980 This library provides basic support for UTF-8 encoding.
2981 It provides all its functions inside the table <a name="pdf-utf8"><code>utf8</code></a>.
2982 This library does not provide any support for Unicode other
2983 than the handling of the encoding.
2984 Any operation that needs the meaning of a character,
2985 such as character classification, is outside its scope.
2986
2987
2988 <p>
2989 Unless stated otherwise,
2990 all functions that expect a byte position as a parameter
2991 assume that the given position is either the start of a byte sequence
2992 or one plus the length of the subject string.
2993 As in the string library,
2994 negative indices count from the end of the string.
2995
2996
2997 <p>
2998 <hr><h3><a name="pdf-utf8.char"><code>utf8.char (&middot;&middot;&middot;)</code></a></h3>
2999 Receives zero or more integers,
3000 converts each one to its corresponding UTF-8 byte sequence
3001 and returns a string with the concatenation of all these sequences.
3002
3003
3004
3005
3006 <p>
3007 <hr><h3><a name="pdf-utf8.charpattern"><code>utf8.charpattern</code></a></h3>
3008 The pattern (a string, not a function) "<code>[\0-\x7F\xC2-\xF4][\x80-\xBF]*</code>"
3009 (see <a href="#6.4.1">&sect;6.4.1</a>),
3010 which matches exactly one UTF-8 byte sequence,
3011 assuming that the subject is a valid UTF-8 string.
3012
3013
3014
3015
3016 <p>
3017 <hr><h3><a name="pdf-utf8.codes"><code>utf8.codes (s)</code></a></h3>
3018
3019
3020 <p>
3021 Returns values so that the construction
3022
3023 <pre>
3024 for p, c in utf8.codes(s) do <em>body</em> end
3025 </pre><p>
3026 will iterate over all characters in string <code>s</code>,
3027 with <code>p</code> being the position (in bytes) and <code>c</code> the code point
3028 of each character.
3029 It raises an error if it meets any invalid byte sequence.
3030
3031
3032
3033
3034 <p>
3035 <hr><h3><a name="pdf-utf8.codepoint"><code>utf8.codepoint (s [, i [, j]])</code></a></h3>
3036 Returns the codepoints (as integers) from all characters in <code>s</code>
3037 that start between byte position <code>i</code> and <code>j</code> (both included).
3038 The default for <code>i</code> is 1 and for <code>j</code> is <code>i</code>.
3039 It raises an error if it meets any invalid byte sequence.
3040
3041
3042
3043
3044 <p>
3045 <hr><h3><a name="pdf-utf8.len"><code>utf8.len (s [, i [, j]])</code></a></h3>
3046 Returns the number of UTF-8 characters in string <code>s</code>
3047 that start between positions <code>i</code> and <code>j</code> (both inclusive).
3048 The default for <code>i</code> is 1 and for <code>j</code> is -1.
3049 If it finds any invalid byte sequence,
3050 returns a false value plus the position of the first invalid byte.
3051
3052
3053
3054
3055 <p>
3056 <hr><h3><a name="pdf-utf8.offset"><code>utf8.offset (s, n [, i])</code></a></h3>
3057 Returns the position (in bytes) where the encoding of the
3058 <code>n</code>-th character of <code>s</code>
3059 (counting from position <code>i</code>) starts.
3060 A negative <code>n</code> gets characters before position <code>i</code>.
3061 The default for <code>i</code> is 1 when <code>n</code> is non-negative
3062 and <code>#s + 1</code> otherwise,
3063 so that <code>utf8.offset(s, -n)</code> gets the offset of the
3064 <code>n</code>-th character from the end of the string.
3065 If the specified character is neither in the subject
3066 nor right after its end,
3067 the function returns <b>nil</b>.
3068
3069
3070 <p>
3071 As a special case,
3072 when <code>n</code> is 0 the function returns the start of the encoding
3073 of the character that contains the <code>i</code>-th byte of <code>s</code>.
3074
3075
3076 <p>
3077 This function assumes that <code>s</code> is a valid UTF-8 string.
3078 2654
3079 2655
3080 2656
3081 2657
3082 2658