LuaCheia Logo  

LuaCheia Reference Manual

rex

Synopsis

cheia.load “rex”
 
start, finish, captures = rex.find(“text”, “regular expression”)
matches = rex.gsub(“text”, “regular expression”,
          function(match, captures) ... end)
 
regex = rex.comp(“regular expression”)
start, finish, captures = regex:match(“text”)
matches = regex:gmatch(“text”,
          function(match, captures) ... end)

Description

This module provides facilities for pattern matching using POSIX extended regular expressions (‘regexes’). Regular expressions are templates which match certain classes of string. In this respect, they are much like shell ‘globs’ (i.e. pathnames containing the wildcard characters ‘*’ and ‘?’). However, they are much more powerful. I will not attempt to describe them completely here, but to summarise the main facilities:

  • If no special characters are used, a regular expression is simply a string which may be searched for inside another, e.g. the regex “is” matches twice in the string “this is a test”: first on the last two characters of ‘this’, and then again on the word ‘is’.
  • The ‘.’ character matches any character except a newline, thus “t.” also matches twice: the first two letters of ‘this’, and the first two letters of ‘test’, however the final ‘t’ is not matched, because there is no following character for the ‘.’ to match.
  • The ‘*’ character causes the preceeding atom (for the moment an atom is either a single letter or a dot) to be matched by zero or more repetitions, so “s.*t” matches once: “s is a test”. Notice that there are a number of shorter substrings that this expression could also match. These are not matched because regexes are ‘greedy’: they take the first match they find and try to make it as long as possible.
  • The ‘+’ character is like ‘*’, but causes the preceeding atom to match one or more times. The ‘?’ character causes it to match zero or one times.
  • Two expressions can be made alternatives using ‘|’, e.g. “this|test” matches ‘this’ and ‘test’.
  • A set of characters can be specified as alternatives using square brackets. Such a grouping as treated as a single atom, e.g. “[his]+” means ‘one or more occurrences of h, i or s in a row’, and matches 3 times: the last three letters of ‘this’, the word ‘is’, and and the ‘s’ from ‘test’. If the first character of a bracketed group is a ‘^’, then all further characters not listed will match. Character ranges can also be specified, e.g. “[0-9]” matches any digit.
  • Parts of a regex can become ‘captures’ by placing them in parentheses. A parenthesised group is also treated as a single atom. Captured substrings are returned separately by the functions below.
  • Any special character can be escaped by preceding it with a backslash. Be aware that the backslash itself will need escaping within quoted strings. For this reason the [[...]] string notation may be preferable.

Examples

Compile a regex to match identifiers, then print each identifier found in a string:

ids = rex.comp(“[A-Za-z_][A-Za-z0-9_]*”)
ids:gmatch(“_X6 = 5 * sin(theta)”,
     function(s,t) print(s) end) » _X6
» sin
» theta

Reference

rex.comp(regex)

Compiles the supplied regular expression. Once it has been compiled, it may be used repeatedly without incurring the overhead of recompiling.

Parameters:
     regex : string  

The regular expression string.

Returns:
     userdata  

The compiled form.

rex.find(text, regex)

Searches for regex within text, and returns the start and end of the first matching substring followed by a table of captures. Returns nils if no match is found. Largely equivalent to the string.find function included with Lua, but uses the more standard POSIX notation. Avoid using this function repeatedly with the same regular expression and consider compiling the expression in advance instead.

Note: string.find returns captures as additional results. rex.find returns a table of captures.

Parameters:
     text : string  

The text to search.

     regex : string  

The regular expression to find.

Returns:
     number  

The 1-based index of the first character of the first matching substring.

     number  

The 1-based index of the last character of the first matching substring.

     table  

An array of captured strings.

rex.gsub(text, regex, fn, [n])

Searches for regex within text, and for each matching substring calls fn, passing the whole matching substring as the first argument, and a table (array) of captures as the second. Provides some of the functionality of the string.gsub function included with Lua, but uses the more standard POSIX notation. Avoid using this function repeatedly with the same regular expression and consider compiling the expression in advance instead.

Note: Despite the name, this function does not do substitution. It does not accept a string as the third argument, and it does not return a modified string. The supplied function is also called with different parameters to string.gsub.

Parameters:
     text : string  

The text to search.

     regex : string  

The regular expression to find.

     fn : function  

The function to call when a matching substring is found.

     n : number  

The maximum number of substitutions to perform.

Returns:
     number  

The number of matches found.

Compiled Regular Expressions

The following methods are available for the compiled regular expressions returned by rex.comp:

regex:match(text)

Searches for regex within text, and returns the start and end of the first matching substring followed by a table of captures.

Parameters:
     text : string  

The text to search.

Returns:
     number  

The 1-based index of the first character of the first matching substring.

     number  

The 1-based index of the last character of the first matching substring.

     table  

An array of captured strings.

regex:gmatch(text, fn, [n])

Searches for regex within text, and for each matching substring calls fn, passing the whole matching substring as the first argument, and a table (array) of captures as the second.

Parameters:
     text : string  

The text to search.

     fn : function  

The function to call when a matching substring is found.

     n : number  

The maximum number of substitutions to perform.

Returns:
     number  

The number of matches found.

Issues

The differences between the rex and string functions of the same name could be confusing.

See also

[Unknown reference: pcre].

The UNIX manual page regex(7).

Revision history

Added in LuaCheia 5.0.

Credits

Based on the regex library by Henry Spencer.

Lua module by Reuben Thomas.

Modifications for LuaCheia by Thatcher Ulrich.

Documentation for LuaCheia by Jamie Webb.

Return to main site

Introduction

 » Writing LuaCheia modules

Module Reference

 » bit
 » cgi
 » fuzzy
 » md5
 » pack
 » rex
 » SDL
 » shelve
 » sqlite

Appendices

 » Module path conventions
 » Module names
 » Application Binary Interface
 » Credits