Parsers

.NET 4.0 Silverlight 4 Silverlight 5 Phone 7.0 Phone 7.1
S.png S.png S.png S.png S.png

Overview

Parsers make it easy to define Context-Free Grammars (CFG) and Context-Sensitive Grammars (CSG) using LINQ for performing pattern matching on a sequence, ultimately projecting data into a new sequence. LINQ allows grammars to be similar in appearance to Backus–Naur Form (BNF).

Essentially, parsers are implementations of Parser Combinators based on Rx. They offer new monads for defining grammars and utilize IObservable<T> and IEnumerable<T> for pushing and pulling input through the grammars.

Example

The following C# code snippet was taken from Rxx Labs, which is available on the Downloads tab. The lab defines an observable stock ticker using the Rx operators: Interval, Select, Scan and Publish. Then it calls the Rxx Parse operator and defines an in-line grammar that matches particular reversal trends, projecting them into an observable sequence of alerts.

static void Main()
{
	var random = new Random();

	var values = Observable
		.Interval(TimeSpan.FromSeconds(1))
		.Select(_ => random.Next(1, 50));

	var ticks = values
		.Scan(StockTick.Empty, (acc, cur) => new StockTick(cur, cur - acc.Value))
		.Publish();

	var alerts = ticks.Parse(parser =>
		from next in parser
		let ups = next.Where(tick => tick.Change > 0)
		let downs = next.Where(tick => tick.Change < 0)
		let downAlert = from manyUps in ups.AtLeast(2).ToList()
			       from reversalDown in downs.NonGreedy()
			       where reversalDown.Change <= -11
			       select new StockAlert(manyUps, reversalDown)
		let upAlert = from manyDowns in downs.AtLeast(2).ToList()
			    from reversalUp in ups.NonGreedy()
			    where reversalUp.Change >= 21
			    select new StockAlert(manyDowns, reversalUp)
		select downAlert.Or(upAlert).Ambiguous(untilCount: 1));

	using (ticks.Subscribe(WriteTick))
	using (alerts.Subscribe(WriteAlert))
	using (ticks.Connect())
	{
		Console.ReadKey();
	}
}

Guidelines

The parser API and the behavior of individual operators can be somewhat confusing if you haven't experimented with them yet. Here is a small set of terms and guidelines to help you get started. Keep these in mind while reading the reference documentation, experimenting and creating your own operators.

Glossary

  • Required / Optional
Indicates whether failure to match throws an exception. Optional grammars simply skip elements that do not match, while required grammars throw a ParseException when no match is found at the current position in the sequence.
  • Ambiguous / Unambiguous
Indicates whether matches can overlap in the sequence. An ambiguous grammar looks for overlapping matches, while an unambiguous grammar matches consecutively.
  • Greedy / Non-greedy
Indicates whether a grammar consumes the elements that it matches. A non-greedy grammar does not consume any elements regardless of whether it matches or not, while a greedy grammar consumes all elements that it matches. Essentially, a non-greedy grammar allows subsequent matches to start at the same element in the sequence as the previous match. Non-greediness is another kind of ambiguity, which doesn't make any forward progress. Note that this definition of non-greediness does not apply to quantifiers.
  • Quantifiers
Operators that apply a quantity to a grammar. Examples include Exactly, Maybe, None, NoneOrMore, OneOrMore and AtLeast. Non-greedy quantifiers match as few times as possible, while greedy quantifiers match as many times as possible.
  • Rule
A named sub-grammar from which a parser's grammar is defined. Typically, individual rules of a grammar are defined as fields on a custom parser class or as let statements in an in-line parser query. The stock alert example above defines four rules: ups, downs, downAlert and upAlert. Rules allow you to organize grammars semantically and to reuse common patterns.

Principals

  • A parser's grammar is optional, unambiguous and greedy by default.
    • To change the first two behaviors use the Required and Ambiguous operators, respectively. The greediness of an entire grammar should never be altered.
  • A grammar's individual rules are optional, unambiguous and greedy by default.
    • To change these behaviors use the Required, Ambiguous and NonGreedy operators, respectively, on any individual grammar rule. The Ambiguous and NonGreedy operators should be used with mutual exclusion; otherwise, the ambiguous behavior overtakes non-greediness. The NonGreedy operator does not enable non-greedy quantification when applied to greedy quantifiers.
    • Quantifiers are greedy by default unless named with a NonGreedy suffix, with the exception of the Exactly operator, which is non-greedy by default.
  • Parser combinators should not be stateful. Some grammars will call the Parse method multiple times with different sources, on the same instance of the object returned by a parser combinator.

Creating Operators

Here are some guidelines to follow when creating new query operators.
  1. Yield new results from previous results.
    1. Use the Yield, Add or Concat extension methods on the previous result to create a new result. When there is more than one previous result, yield a new result from the last result in the list.
    2. Exception: When there are no previous results from which to yield a new result, create a new result.
    3. Exception: Some operators may require previous results to be ignored.
  2. Propagate look-ahead results.
    1. Yield a look-ahead result if the previous result or the last in a collection of results is a look-ahead result. This happens automatically if you follow guideline #1.

Platform Support

The entire parser API is available in the Rxx library for the full .NET Framework 4.0, Silverlight 4/5 and Windows Phone 7.0/7.1 platforms.

Background

The original parser implementation was developed in response to a post in the Rx forums:

Reactive Parser Combinators
http://social.msdn.microsoft.com/Forums/en-US/rx/thread/0f72e5c0-1476-4969-92da-633000346d0d

The original design used delegates and was based largely on this blog post by Luke Hoban. It has since evolved into a more fully-featured, class-based parser platform.

Resources

Last edited Mar 11, 2012 at 7:13 PM by davedev, version 12