Parsing
Parser combinators are a technique for parsing.
megaparsec
(and a more restrictive but faster library, attoparsec
) are industrial strength parser combinator libraries in Haskell.
External resources¶
This tutorial explains how to use megaparsec
in detail.
This tutorial explains the design and implementation of parser combinators.
What are parsers?¶
import Data.Char (digitToInt, isDigit)
type Parser a = [Char] -> Maybe (a, [Char]) -- (1)!
digit :: Parser Int -- (2)!
digit (x:xs)
| isDigit x = Just (digitToInt x, xs)
| otherwise = Nothing
digit [] = Nothing
-
A
Parser
(parameterized by some typea
), takes a sequence of characters to be parsed, and either fails or returns a parse result from some initial segment of the sequence and the remaining sequence with that segment removed. -
digit
is aParser
parameterized byInt
(which is the type of its result). It will try to strip a numeral (1
,2
, etc..) from the input sequence. If the sequence does not begin with a digit, it will fail (i.e., returnNothing
). If it succeeds, it will return anInt
, not aChar
.
Libraries like parsec
, megaparsec
and attoparsec
provide more sophisticated versions of this idea:
> import Text.Megaparsec
> import Text.Megaparsec.Char
> :set -XOverloadedStrings
> type Parser = Parsec Void T.Text
-- (1)!
> parseWith parser = putStrLn . (either errorBundlePretty T.unpack ) . parse parser ""
-- (3)!
> aParser = "a" :: Parser T.Text
> parseWith aParser "ab"
a
> parseWith aParser "ba" -- (2)!
1:1:
|
1 | ba
| ^
unexpected 'b'
expecting 'a'
abParser = "ab" :: Parser T.Text
> parseWith abParser "ab"
ab
> parseWith abParser "ba"
1:1:
|
1 | ba
| ^^
unexpected "ba"
expecting "ab"
- A helper function for parsing.
megaparsec
generates pretty error messages when it fails.Parser T.Text
is the type of a parser that returns a result of typeT.Text
What are combinators¶
Parser combinator libraries provide not just simple parsers like the above, but the ability to combine parsers to build more complex ones.
Here are examples from megaparsec
:
{-# LANGUAGE OverloadedStrings #-}
import Text.Megaparsec
import Data.Text ( Text, unpack )
import Data.Void (Void)
import Text.Megaparsec.Char (space)
type Parser = Parsec Void Text
-- helper function to run parser
parseAndPrint :: Show b => Parsec Void Text b -> Text -> IO ()
parseAndPrint parser line =
either
(putStrLn . errorBundlePretty)
print
(runParser parser "" line)
-- parse 'a' then 'b'
abParser :: Parser Text
abParser = "ab" -- (2)!
-- parse 'b' then 'a'
baParser :: Parser Text
baParser = "ba"
-- parse 'ab' then 'ba'
abbaParser :: Parser Text -- (1)!
abbaParser = do
ab <- abParser -- (3)!
ba <- baParser -- (4)!
return (ab <> ba) -- (5)!
-- parse either 'ba' or 'ab'
baOrabParser :: Parser Text
baOrabParser = baParser <|> abParser -- (6)!
main :: IO ()
main = parseAndPrint baOrabParser "ab"
This runs as follows:
> parseAndPrint baParser "ba"
"ba"
> parseAndPrint baParser "ab"
1:1:
|
1 | ab
| ^^
unexpected "ab"
expecting "ba"
> parseAndPrint baOrabParser "ab"
"ab"
> parseAndPrint abbaParser "abba"
"abba"
> parseAndPrint abbaParser "baab"
1:1:
|
1 | baab
| ^^
unexpected "ba"
expecting "ab"
Parser
is a monad, so we can use do-notation, which is very convenient for building complex parsers out of simpler ones.megaparsec
takes advantage of OverloadedStrings, so that"ab"
is actually a parser.- This means: run
abParser
and name the resultab
. - This means: run
baParser
(on what remains after runningabParser
previously) and name the resultba
. - This means: the result of the parser is the value
ab <> ba
. - This means: try first
baParser
and if it fails, tryabParser
.
With custom types¶
One of the main appeals of Haskell's parser combinators is that the output of your parser can be a custom type:
data PieceType = Bishop | Rook deriving Show
data Color = White | Black deriving Show
data Piece = Piece PieceType Color deriving Show
pieceTypeParser :: Parser PieceType -- (3)!
pieceTypeParser = do
str <- "bishop" <|> "rook"
return $ case str of
"bishop" -> Bishop
_ -> Rook
colorParser :: Parser Color
colorParser = do
str <- "white" <|> "black"
return $ case str of
"white" -> White
_ -> Black
pieceParser :: Parser Piece
pieceParser = do
color <- colorParser
space -- (1)!
pieceType <- pieceTypeParser
return (Piece pieceType color)
optionalColorParser :: Parser Piece
optionalColorParser = do
maybeColor <- optional colorParser -- (2)!
pieceType <- pieceTypeParser
return $ case maybeColor of
Just color -> Piece pieceType color
Nothing -> Piece pieceType White
- 0 or more whitespace.
optional
takescolorParser
and returns a new parser that either parses nothing orcolorParser
. The result, appropriately, is aMaybe Color
.- The result type is the custom type just defined above.
Examples:
> parseAndPrint optionalColorParser "white rook"
Piece Rook White
> parseAndPrint optionalColorParser "rook"
Piece Rook White
> parseAndPrint pieceParser "white rook"
Piece Rook White'
> parseAndPrint pieceParser "rook"
1:1:
|
1 | rook
| ^^^^
unexpected "rook"
expecting "black" or "white"
Created: January 12, 2023