Grammar¶
Internally, aCLImatise uses a Parsing Expression Grammar, which is a class of recursive grammar used to parse programming languages. This grammar is expressed and parsed using the PyParsing Python library. To help visualise the grammar used to parse command-line help, here is a Railroad Diagram generated using PyParsing.
The “terminal” nodes (circular) are either:
In quotes, e.g.
':'
, which indicates a literal stringIn the form
W:(start, body)
, e.g.W:(0-9@-Za-z, \--9@-Z\\_a-z|)
, which indicates a word where the first character comes from thestart
list of characters, and the remaining characters come from thebody
charactersIn the form
Re: pattern
, which indicates a regular expression pattern used to match this terminalWhitespace nodes, e.g.
<SP><TAB><CR><LF>
, which list the types of whitespace being parsed by that terminalCertain other special nodes like
Empty
, andLineStart
which match based on custom code. Where possible, these are annotated with what they are designed to match, for exampleUnIndent
matches an unindent in the input file.
The “non-terminal” nodes (square) refer to subsections of the diagram, which are spelled-out under the subheading with the same name.
To read the diagram, start with FlagList
, the start node, and from there follow the lines along any branch of the path that goes forward (although some paths end up turning backwards to indicate loops). Any string that matches the sequence of tokens you encounter along that path will be parsed by the grammar.