This project has moved and is read-only. For the latest updates, please go here.

Rxx Parsers: Quantifiers with composed parsers

Topics: Parsers
Oct 21, 2014 at 5:07 AM
Hi Dave, all:

Are Rxx Parsers intended to allow for quantifiers to be specified on composed parsers? Here's an example of the code I'd like to write, which is attempting to recognize three instances of "letter followed by digit" followed by "two letters", somewhat like this regex: ([a-zA-Z]\d){3}[a-zA-Z]{2}.

(I've got a real use case as well, but it's more complicated, involving parsing binary data off a serial link. The binary data is comprised of 4-DWORD packets with optional "idle" filler DWORDS mixed in to keep the link toggling during a delay mid-packet.)
[Description("Matches three instances of {a letter followed by an integer} followed by one instance of {two letters}")]
public void ComposedExactlyExperiment()
    IObservable<char> input = "ab1c2d3efg4".ToObservable();

    IObservable<string> query = input.ParseString(parser =>
        from next in parser
        let letter = parser.Character(char.IsLetter)
        let number = parser.Character(char.IsNumber)
        let letterThenNumber = letter.And(number)
        let naiveParser = letterThenNumber.And(letterThenNumber).And(letterThenNumber).And(letter).And(letter)
        let desiredParser = letterThenNumber.Exactly(3).And(letter.Exactly(2))
        select desiredParser.Join().Ambiguous()

Running this code with the naiveParser produces the desired output, while running it as written with the desiredParser produces a pile of ToString()-ed IObservables.

I believe this is ultimately due to the fact that the only available overrides of the quantification extension methods take template parameters TSource and TResult but always "amplify" to a return type of IObservable<TResult>. This is the case for Exactly in my usage, but also appears to be true for at least:
  • None
  • AtEndOfSequence
  • Exactly
  • NoneOrMore
  • NoneOrMoreNonGreedy
  • OneOrMore
  • OneOrMoreNonGreedy
  • AtLeast
  • AtLeastNonGreedy
I contrast this to the available overrides for Maybe and other non-quantification extension methods like And, which do provide "non-amplifying" IObservable<TResult> specializations.

I'm wondering if this is just an oversight in the current parsers that can be addressed by adding the relevant non-amplifying specializations, or if this is actually a by-design behavior, and there's some other usage paradigm or trick for how to properly assemble complex parser combinations (especially talking about quantifiers) that I've just completely missed.

(Prototyping a new Exactly that extends a this IObservableParser<TSource, IObservable<TResult>> parser does seem to do what I expect, but again: not sure if that's actually how this should work or if i'm Doing It Wrong(tm)...)

Thanks for your help!
Oct 21, 2014 at 6:29 AM
Hi Colin,

Good question. Quantifiers return a sequence because they potentially return more than one result. It's up to you to decide how the result sequence is used when composing parser rules. ToList is a commonly used operator.

However, if you apply a quantifier to the result of another quantifier, such as And, then you end up with nested observables as your result. Again, it's up to you to decide how to use that sequence. In most cases you'll probably just want to flatten it, which I've illustrated below using Merge.

And is used to compose parsers, though it only works on homogenously-typed parsers, because internally it has some performance optimizations. I'd actually recommend this in cases where performance is of concern, though if it's really of concern than consider using the IEnumerable<T> parsers instead because they perform much better.

SelectMany has two purposes:
  1. It's required to compose heterogeneously-typed parsers, but as a result it can't do the performance optimizations that And can do.
  2. It can be used to compose the results of quantifiers into a regular Rx query within a parser rule!
Here's a solution to your query using SelectMany. Let me know if you don't understand it or if it doesn't produce your desired output.
IObservable<string> query = input.ParseString(parser =>
    from next in parser
    let letter = parser.Character(char.IsLetter)
    let number = parser.Character(char.IsNumber)
    let letterThenNumber = letter.And(number)
    let letterThenLetter = letter.And(letter)
    let desiredParser = from prefix in letterThenNumber.Exactly(3)
                        from suffix in letterThenLetter
                        select prefix.Merge().Concat(suffix)  // Normal Rx query
    select desiredParser.Join().Ambiguous()
- Dave
Marked as answer by ccmcgeek on 10/23/2014 at 9:17 PM
Oct 23, 2014 at 6:44 AM
Thanks Dave,

The paradigm is starting to come together more for me now! Your SelectMany explanation and example helped tremendously with one of the roots of my misunderstanding... from previous snippets and examples I had read, I did not fully grok the transition back to regular Rx there.

I've applied those lessons to the serial data parsing use case that I've got, and am indeed getting back the output I was hoping for. Excellent :-)

Thanks also for the pointer on performance between Rx and Ix parsers... I expected that this may be the case, but decided to implement and profile the reactive approach first as it offers such a clean extension into live parsing. Sounds like I may want to be a little more proactive with respect to profiling both methods.

On the semantics of Exactly vs And, I'm still a little bit confused as to why the semantics are different. (Referring back to the parsers from our previous snippets), I would have expected all of the following to yield a combined parser outputting the same type.
// ...
    let parser1 = letterThenNumber.And(letterThenNumber).And(letterThenNumber)
    let parser2 = new[] { letterThenNumber, letterThenNumber, letterThenNumber}.All()
    let parser3 = letterThenNumber.Exactly(3)
// ...
I note that And and All both yield the less-nested output I had originally expected, so I'm still working on understanding why Exactly behaves differently... Is it more accurate to say that Exactly behaves as a typical quantifier does, and it is actually And and All that are the exceptional cases (due to the performance optimizations you mentioned)? Put another way, should I understand that all quantifiers should yield a sequence of sequences, but for performance reasons, And and All are enhanced to support the common use case of automatically merging the sequence of sequences back down?

Thanks again for your time... I see elsewhere on the forum here that you're busy shipping a different product at the moment; I really appreciate your dedication to helping folks out on this 'backburner' one!

Oct 23, 2014 at 7:41 AM
Hi Colin,

I think you're right that I had originally decided to give Exactly quantifier semantics while And is a simple concatenation and All is simply a generalization of And, though it was a while ago so it's hard to remember my exact reasons.

However, looking at the code comments again I'm reminded that Exactly actually performs better than All for large values of N.



Also of interest is the annoying code duplication between the models. Sure, the duality between IE<T> and IO<T> is great, I just wish the compiler would've allowed me to take advantage of it! :p

- Dave
Oct 23, 2014 at 7:46 AM
In addition, returning a sequence from a quantifier makes it easy to do this:
let triple = next.Exactly(3).ToList()
Oct 23, 2014 at 7:47 AM
Edited Oct 23, 2014 at 7:50 AM
And of course, an observable sequence remains reactive, thus you can do other things to it, such as Throttle.

(NOTE: I'm pretty sure making quantifiers truly reactive is actually still a work item, but you get the idea...)
Oct 24, 2014 at 5:17 AM
Got it... it's making more and more sense as I go :)

I don't know Haskell, but I might take a pass at the monadic parser papers from Dr. Hutton and Dr. Meijer as well, and see if that inspires me to assemble the pieces in a more natural order... it's a very cool concept you've got here, and it's been tremendously helpful to find that someone has already gone and done all the hard work to write and test the framework.

Really appreciate your time and feedback, Dave, and all the work on Rxx! I'm off and running now on my project!
Oct 25, 2014 at 7:51 AM
Edited Oct 25, 2014 at 7:54 AM
I've been reading that paper myself on and off for like 2 years now ;-)

Erik also showed me in an email a while ago that my implementation isn't ideal because it can't represent FSMs. I reminded him recently and he responded:

Though, frankly, I'm having a hard time seeing why Parsec in C# should just be "better". Each time I think about how to do it and look at the code examples, I can't help but draw the conclusion that defining an FSM-compatible parser is not going to be as elegant as I've done it (Edit: the non-FSM part, I mean) with Rxx parser's in-line LINQ syntax. I'm probably wrong and just haven't read into it deep enough yet.

I'm interested to hear if you make any real progress on it.
Oct 25, 2014 at 7:53 AM
Although, I have always wondered whether the choice to represent parser results as sequences themselves was a good one. IIRC, it was based on Luke Hoban's approach, but I think it may have led me astray a bit.
Oct 25, 2014 at 7:56 AM
Edited Oct 25, 2014 at 8:01 AM
Here are the open work items, in case you're interested in my old plans for the library. Based on Erik's reply, I'm assuming it's probably not worth updating this library (parsers) going forward (except to fix bugs, of course).
Oct 26, 2014 at 12:37 AM
Hm, so I should read Erik's reply as implying that someone ought to tackle a Reactive port of Parsec to C#? It wasn't immediately clear where that description of the design constraints might be found off of the parsec homepage. If Erik's reply is just meant to say "parse event streams using Haskell instead", that would be a somewhat unsatisfying answer :-)

I browsed through Sprache as well, but noted that it is both iterative and char-only. Looks like some work going on in the abstraction of IInput to move toward a more flexible design, but it's not there yet.

Still strongly hoping to keep my program entirely reactive until forced to explicitly buffer and iterate!

Speaking of bug fixes: does the current revision (74914) build cleanly by you? I'm having issues with the Portable-4.5 and Universal libraries missing Errors and Text resource files. Dropping in the lastest copy (deleted from Portable-4.5\Properties with r74570) got me past that, but I'll admit that was a quick hack and I really didn't read the files in detail :-/
Oct 26, 2014 at 3:44 AM
I should read Erik's reply as implying that someone ought to tackle a Reactive port of Parsec to C#?
That was my understanding, though of course it's already been done, sort of. I'm not sure, though I don't think this implementation is reactive:

Haven't really looked at it yet.
It wasn't immediately clear where that description of the design constraints might be found off of the parsec homepage
I think he was referring to his papers, because I didn't see anything about design constraints either.
If Erik's reply is just meant to say "parse event streams using Haskell instead", that would be a somewhat unsatisfying answer :-)
I browsed through Sprache [snip]
Thanks for the link, I hadn't heard of that project. I'm pretty sure the char-only design stems from the fact that it was based on Erik's papers and Luke Hoban's blog, just like my implementation was at first.
does the current revision (74914) build cleanly by you?
Well, I assume it did because I always test it before checking it in. But it's certainly possible that I have files present locally that aren't in source control. It's probably just due to an incomplete refactoring that I didn't notice because my local build was successful. I'll check again and upload any missing files.