comparison website/src/manual.html.luan @ 555:e25ba7a2e816

some String documentation and fixes
author Franklin Schmidt <fschmidt@gmail.com>
date Fri, 19 Jun 2015 04:29:06 -0600
parents b1256e2d19a3
children d02f43598ba3
comparison
equal deleted inserted replaced
554:18504c41b0be 555:e25ba7a2e816
89 <a href="#libs">Standard Libraries</a> 89 <a href="#libs">Standard Libraries</a>
90 <ul> 90 <ul>
91 <li><a href="#default_lib">Default Environment</a></li> 91 <li><a href="#default_lib">Default Environment</a></li>
92 <li><a href="#luan_lib">Basic Functions</a></li> 92 <li><a href="#luan_lib">Basic Functions</a></li>
93 <li><a href="#package_lib">Modules</a></li> 93 <li><a href="#package_lib">Modules</a></li>
94 <li><a href="#string_lib">String Manipulation</a></li>
94 </ul> 95 </ul>
95 </div> 96 </div>
96 97
97 <hr/> 98 <hr/>
98 99
2344 2345
2345 2346
2346 2347
2347 2348
2348 2349
2349 <h2>6.4 &ndash; <a name="6.4">String Manipulation</a></h2> 2350 <h3 <%=heading_options%> ><a name="string_lib">String Manipulation</a></h3>
2351
2352 <p>
2353 Include this library by:
2354
2355 <p><tt><pre>
2356 local String = require "luan:String"
2357 </pre></tt></p>
2350 2358
2351 <p> 2359 <p>
2352 This library provides generic functions for string manipulation, 2360 This library provides generic functions for string manipulation,
2353 such as finding and extracting substrings, and pattern matching. 2361 such as finding and extracting substrings, and pattern matching.
2354 When indexing a string in Lua, the first character is at position&nbsp;1 2362 When indexing a string in Luan, the first character is at position&nbsp;1
2355 (not at&nbsp;0, as in C). 2363 (not at&nbsp;0, as in Java).
2356 Indices are allowed to be negative and are interpreted as indexing backwards, 2364 Indices are allowed to be negative and are interpreted as indexing backwards,
2357 from the end of the string. 2365 from the end of the string.
2358 Thus, the last character is at position -1, and so on. 2366 Thus, the last character is at position -1, and so on.
2359 2367
2360 2368
2361 <p>
2362 The string library provides all its functions inside the table
2363 <a name="pdf-string"><code>string</code></a>.
2364 It also sets a metatable for strings
2365 where the <code>__index</code> field points to the <code>string</code> table.
2366 Therefore, you can use the string functions in object-oriented style.
2367 For instance, <code>string.byte(s,i)</code>
2368 can be written as <code>s:byte(i)</code>.
2369
2370
2371 <p>
2372 The string library assumes one-byte character encodings.
2373 2369
2374 2370
2375 <p> 2371 <p>
2376 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3> 2372 <hr><h3><a name="pdf-string.byte"><code>string.byte (s [, i [, j]])</code></a></h3>
2377 Returns the internal numerical codes of the characters <code>s[i]</code>, 2373 Returns the internal numerical codes of the characters <code>s[i]</code>,
2385 <p> 2381 <p>
2386 Numerical codes are not necessarily portable across platforms. 2382 Numerical codes are not necessarily portable across platforms.
2387 2383
2388 2384
2389 2385
2390 2386 <h4 <%=heading_options%> ><a name="String.char"><tt>String.char (&middot;&middot;&middot;)</tt></a></h4>
2391 <p> 2387
2392 <hr><h3><a name="pdf-string.char"><code>string.char (&middot;&middot;&middot;)</code></a></h3> 2388 <p>
2393 Receives zero or more integers. 2389 Receives zero or more integers.
2394 Returns a string with length equal to the number of arguments, 2390 Returns a string with length equal to the number of arguments,
2395 in which each character has the internal numerical code equal 2391 in which each character has the internal numerical code equal
2396 to its corresponding argument. 2392 to its corresponding argument.
2397 2393
2398 2394
2399 <p> 2395 <h4 <%=heading_options%> ><a name="String.concat"><tt>String.concat (&middot;&middot;&middot;)</tt></a></h4>
2400 Numerical codes are not necessarily portable across platforms. 2396
2401 2397 <p>
2402 2398 Concatenates the <a href="#Luan.to_string"><tt>to_string</tt></a> value of all arguments.
2403 2399
2404 2400
2405 <p> 2401
2406 <hr><h3><a name="pdf-string.dump"><code>string.dump (function [, strip])</code></a></h3> 2402 <h4 <%=heading_options%> ><a name="String.encode"><tt>String.encode (s)</tt></a></h4>
2407 2403
2408 2404 <p>
2409 <p> 2405 Encodes argument <tt>s</tt> into a string that can be placed in quotes so as to return the original value of the string.
2410 Returns a string containing a binary representation 2406
2411 (a <em>binary chunk</em>) 2407
2412 of the given function, 2408
2413 so that a later <a href="#pdf-load"><code>load</code></a> on this string returns 2409
2414 a copy of the function (but with new upvalues). 2410 <h4 <%=heading_options%> ><a name="String.find"><tt>String.find (s, pattern [, init [, plain]])</tt></a></h4>
2415 If <code>strip</code> is a true value,
2416 the binary representation is created without debug information
2417 about the function
2418 (local variable names, lines, etc.).
2419
2420
2421 <p>
2422 Functions with upvalues have only their number of upvalues saved.
2423 When (re)loaded,
2424 those upvalues receive fresh instances containing <b>nil</b>.
2425 (You can use the debug library to serialize
2426 and reload the upvalues of a function
2427 in a way adequate to your needs.)
2428
2429
2430
2431
2432 <p>
2433 <hr><h3><a name="pdf-string.find"><code>string.find (s, pattern [, init [, plain]])</code></a></h3>
2434
2435 2411
2436 <p> 2412 <p>
2437 Looks for the first match of 2413 Looks for the first match of
2438 <code>pattern</code> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <code>s</code>. 2414 <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) in the string <tt>s</tt>.
2439 If it finds a match, then <code>find</code> returns the indices of&nbsp;<code>s</code> 2415 If it finds a match, then <tt>find</tt> returns the indices of&nbsp;<tt>s</tt>
2440 where this occurrence starts and ends; 2416 where this occurrence starts and ends;
2441 otherwise, it returns <b>nil</b>. 2417 otherwise, it returns <b>nil</b>.
2442 A third, optional numerical argument <code>init</code> specifies 2418 A third, optional numerical argument <tt>init</tt> specifies
2443 where to start the search; 2419 where to start the search;
2444 its default value is&nbsp;1 and can be negative. 2420 its default value is&nbsp;1 and can be negative.
2445 A value of <b>true</b> as a fourth, optional argument <code>plain</code> 2421 A value of <b>true</b> as a fourth, optional argument <tt>plain</tt>
2446 turns off the pattern matching facilities, 2422 turns off the pattern matching facilities,
2447 so the function does a plain "find substring" operation, 2423 so the function does a plain "find substring" operation,
2448 with no characters in <code>pattern</code> being considered magic. 2424 with no characters in <tt>pattern</tt> being considered magic.
2449 Note that if <code>plain</code> is given, then <code>init</code> must be given as well. 2425 Note that if <tt>plain</tt> is given, then <tt>init</tt> must be given as well.
2450
2451 2426
2452 <p> 2427 <p>
2453 If the pattern has captures, 2428 If the pattern has captures,
2454 then in a successful match 2429 then in a successful match
2455 the captured values are also returned, 2430 the captured values are also returned,
2456 after the two indices. 2431 after the two indices.
2457 2432
2458 2433
2459 2434
2460 2435
2461 <p> 2436 <h4 <%=heading_options%> ><a name="String.format"><tt>String.format (formatstring, &middot;&middot;&middot;)</tt></a></h4>
2462 <hr><h3><a name="pdf-string.format"><code>string.format (formatstring, &middot;&middot;&middot;)</code></a></h3>
2463 2437
2464 2438
2465 <p> 2439 <p>
2466 Returns a formatted version of its variable number of arguments 2440 Returns a formatted version of its variable number of arguments
2467 following the description given in its first argument (which must be a string). 2441 following the description given in its first argument (which must be a string).
2468 The format string follows the same rules as the ISO&nbsp;C function <code>sprintf</code>. 2442 The format string follows the same rules as the Java function <a href="http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#format(java.lang.String,%20java.lang.Object...)"><tt>String.format</tt></a> because Luan calls this internally.
2469 The only differences are that the options/modifiers 2443
2470 <code>*</code>, <code>h</code>, <code>L</code>, <code>l</code>, <code>n</code>, 2444 <p>
2471 and <code>p</code> are not supported 2445 Note that Java's <tt>String.format</tt> is too stupid to convert between ints and floats, so you must provide the right kind of number.
2472 and that there is an extra option, <code>q</code>. 2446
2473 The <code>q</code> option formats a string between double quotes, 2447
2474 using escape sequences when necessary to ensure that 2448
2475 it can safely be read back by the Lua interpreter. 2449 <h4 <%=heading_options%> ><a name="String.gmatch"><tt>String.gmatch (s, pattern)</tt></a></h4>
2476 For instance, the call 2450
2477 2451 <p>
2478 <pre>
2479 string.format('%q', 'a string with "quotes" and \n new line')
2480 </pre><p>
2481 may produce the string:
2482
2483 <pre>
2484 "a string with \"quotes\" and \
2485 new line"
2486 </pre>
2487
2488 <p>
2489 Options
2490 <code>A</code> and <code>a</code> (when available),
2491 <code>E</code>, <code>e</code>, <code>f</code>,
2492 <code>G</code>, and <code>g</code> all expect a number as argument.
2493 Options <code>c</code>, <code>d</code>,
2494 <code>i</code>, <code>o</code>, <code>u</code>, <code>X</code>, and <code>x</code>
2495 expect an integer.
2496 Option <code>q</code> expects a string;
2497 option <code>s</code> expects a string without embedded zeros.
2498 If the argument to option <code>s</code> is not a string,
2499 it is converted to one following the same rules of <a href="#pdf-tostring"><code>tostring</code></a>.
2500
2501
2502
2503
2504 <p>
2505 <hr><h3><a name="pdf-string.gmatch"><code>string.gmatch (s, pattern)</code></a></h3>
2506 Returns an iterator function that, 2452 Returns an iterator function that,
2507 each time it is called, 2453 each time it is called,
2508 returns the next captures from <code>pattern</code> (see <a href="#6.4.1">&sect;6.4.1</a>) 2454 returns the next captures from <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>)
2509 over the string <code>s</code>. 2455 over the string <tt>s</tt>.
2510 If <code>pattern</code> specifies no captures, 2456 If <tt>pattern</tt> specifies no captures,
2511 then the whole match is produced in each call. 2457 then the whole match is produced in each call.
2512 2458
2513 2459
2514 <p> 2460 <p>
2515 As an example, the following loop 2461 As an example, the following loop
2516 will iterate over all the words from string <code>s</code>, 2462 will iterate over all the words from string <tt>s</tt>,
2517 printing one per line: 2463 printing one per line:
2518 2464
2519 <pre> 2465 <p><tt><pre>
2520 s = "hello world from Lua" 2466 local s = "hello world from Lua"
2521 for w in string.gmatch(s, "%a+") do 2467 for w in String.gmatch(s, [[\w+]]) do
2522 print(w) 2468 print(w)
2523 end 2469 end
2524 </pre><p> 2470 </pre></tt></p>
2525 The next example collects all pairs <code>key=value</code> from the 2471
2472 <p>
2473 The next example collects all pairs <tt>key=value</tt> from the
2526 given string into a table: 2474 given string into a table:
2527 2475
2528 <pre> 2476 <p><tt><pre>
2529 t = {} 2477 local t = {}
2530 s = "from=world, to=Lua" 2478 local s = "from=world, to=Lua"
2531 for k, v in string.gmatch(s, "(%w+)=(%w+)") do 2479 for k, v in String.gmatch(s, [[(\w+)=(\w+)]]) do
2532 t[k] = v 2480 t[k] = v
2533 end 2481 end
2534 </pre> 2482 </pre></tt></p>
2535 2483
2536 <p> 2484 <p>
2537 For this function, a caret '<code>^</code>' at the start of a pattern does not 2485 For this function, a caret '<tt>^</tt>' at the start of a pattern does not
2538 work as an anchor, as this would prevent the iteration. 2486 work as an anchor, as this would prevent the iteration.
2539 2487
2540 2488
2541 2489
2542 2490 <h4 <%=heading_options%> ><a name="String.gsub"><tt>String.gsub (s, pattern, repl [, n])</tt></a></h4>
2543 <p> 2491
2544 <hr><h3><a name="pdf-string.gsub"><code>string.gsub (s, pattern, repl [, n])</code></a></h3> 2492 <p>
2545 Returns a copy of <code>s</code> 2493 Returns a copy of <tt>s</tt>
2546 in which all (or the first <code>n</code>, if given) 2494 in which all (or the first <tt>n</tt>, if given)
2547 occurrences of the <code>pattern</code> (see <a href="#6.4.1">&sect;6.4.1</a>) have been 2495 occurrences of the <tt>pattern</tt> (see <a href="#6.4.1">&sect;6.4.1</a>) have been
2548 replaced by a replacement string specified by <code>repl</code>, 2496 replaced by a replacement string specified by <tt>repl</tt>,
2549 which can be a string, a table, or a function. 2497 which can be a string, a table, or a function.
2550 <code>gsub</code> also returns, as its second value, 2498 <tt>gsub</tt> also returns, as its second value,
2551 the total number of matches that occurred. 2499 the total number of matches that occurred.
2552 The name <code>gsub</code> comes from <em>Global SUBstitution</em>. 2500 The name <tt>gsub</tt> comes from <i>Global SUBstitution</i>.
2553 2501
2554 2502
2555 <p> 2503 <p>
2556 If <code>repl</code> is a string, then its value is used for replacement. 2504 If <tt>repl</tt> is a string, then its value is used for replacement.
2557 The character&nbsp;<code>%</code> works as an escape character: 2505 The character&nbsp;<tt>\</tt> works as an escape character.
2558 any sequence in <code>repl</code> of the form <code>%<em>d</em></code>, 2506 Any sequence in <tt>repl</tt> of the form <tt>$<i>d</i></tt>,
2559 with <em>d</em> between 1 and 9, 2507 with <i>d</i> between 1 and 9,
2560 stands for the value of the <em>d</em>-th captured substring. 2508 stands for the value of the <i>d</i>-th captured substring.
2561 The sequence <code>%0</code> stands for the whole match. 2509 The sequence <tt>$0</tt> stands for the whole match.
2562 The sequence <code>%%</code> stands for a single&nbsp;<code>%</code>. 2510
2563 2511
2564 2512 <p>
2565 <p> 2513 If <tt>repl</tt> is a table, then the table is queried for every match,
2566 If <code>repl</code> is a table, then the table is queried for every match,
2567 using the first capture as the key. 2514 using the first capture as the key.
2568 2515
2569 2516
2570 <p> 2517 <p>
2571 If <code>repl</code> is a function, then this function is called every time a 2518 If <tt>repl</tt> is a function, then this function is called every time a
2572 match occurs, with all captured substrings passed as arguments, 2519 match occurs, with all captured substrings passed as arguments,
2573 in order. 2520 in order.
2574 2521
2575 2522
2576 <p> 2523 <p>
2579 then it behaves as if the whole pattern was inside a capture. 2526 then it behaves as if the whole pattern was inside a capture.
2580 2527
2581 2528
2582 <p> 2529 <p>
2583 If the value returned by the table query or by the function call 2530 If the value returned by the table query or by the function call
2584 is a string or a number, 2531 is not <b>nil</b>,
2585 then it is used as the replacement string; 2532 then it is used as the replacement string;
2586 otherwise, if it is <b>false</b> or <b>nil</b>, 2533 otherwise, if it is <b>nil</b>,
2587 then there is no replacement 2534 then there is no replacement
2588 (that is, the original match is kept in the string). 2535 (that is, the original match is kept in the string).
2589 2536
2590 2537
2591 <p> 2538 <p>
2592 Here are some examples: 2539 Here are some examples:
2593 2540
2594 <pre> 2541 <p><tt><pre>
2595 x = string.gsub("hello world", "(%w+)", "%1 %1") 2542 x = String.gsub("hello world", [[(\w+)]], "$1 $1")
2596 --&gt; x="hello hello world world" 2543 --&gt; x="hello hello world world"
2597 2544
2598 x = string.gsub("hello world", "%w+", "%0 %0", 1) 2545 x = String.gsub("hello world", [[\w+]], "$0 $0", 1)
2599 --&gt; x="hello hello world" 2546 --&gt; x="hello hello world"
2600 2547
2601 x = string.gsub("hello world from Lua", "(%w+)%s*(%w+)", "%2 %1") 2548 x = String.gsub("hello world from Luan", [[(\w+)\s*(\w+)]], "$2 $1")
2602 --&gt; x="world hello Lua from" 2549 --&gt; x="world hello Luan from"
2603 2550
2604 x = string.gsub("home = $HOME, user = $USER", "%$(%w+)", os.getenv) 2551 x = String.gsub("4+5 = $return 4+5$", [[\$(.*?)\$]], function (s)
2605 --&gt; x="home = /home/roberto, user = roberto"
2606
2607 x = string.gsub("4+5 = $return 4+5$", "%$(.-)%$", function (s)
2608 return load(s)() 2552 return load(s)()
2609 end) 2553 end)
2610 --&gt; x="4+5 = 9" 2554 --&gt; x="4+5 = 9"
2611 2555
2612 local t = {name="lua", version="5.3"} 2556 local t = {name="lua", version="5.3"}
2613 x = string.gsub("$name-$version.tar.gz", "%$(%w+)", t) 2557 x = String.gsub("$name-$version.tar.gz", [[\$(\w+)]], t)
2614 --&gt; x="lua-5.3.tar.gz" 2558 --&gt; x="lua-5.3.tar.gz"
2615 </pre> 2559 </pre></tt></p>
2616 2560
2617 2561
2618 2562
2619 <p> 2563 <h4 <%=heading_options%> ><a name="String.lower"><tt>String.lower (s)</tt></a></h4>
2620 <hr><h3><a name="pdf-string.len"><code>string.len (s)</code></a></h3> 2564 <p>
2621 Receives a string and returns its length.
2622 The empty string <code>""</code> has length 0.
2623 Embedded zeros are counted,
2624 so <code>"a\000bc\000"</code> has length 5.
2625
2626
2627
2628
2629 <p>
2630 <hr><h3><a name="pdf-string.lower"><code>string.lower (s)</code></a></h3>
2631 Receives a string and returns a copy of this string with all 2565 Receives a string and returns a copy of this string with all
2632 uppercase letters changed to lowercase. 2566 uppercase letters changed to lowercase.
2633 All other characters are left unchanged. 2567 All other characters are left unchanged.
2634 The definition of what an uppercase letter is depends on the current locale.
2635 2568
2636 2569
2637 2570
2638 2571
2639 <p> 2572 <p>