「Derivatives of Regular Expressions with Lookahead」

Derivatives of Regular Expressions with Lookahead

[Journal of Information Processing Vol.27, pp.422-430]
[情報処理学会論文誌 プログラミング Vol.12 No.2, Preprint掲載]

[Abstract]

Lookahead is an extension of regular expressions that has been adopted in many implementations and is widely used. Lookahead represents what is allowed as the rest of input. Morihata developed a conversion from regular expressions with lookahead (REwLA) to deterministic finite automata by extending Thompson's construction. In this paper, we develop a conversion from REwLA to deterministic finite automata by extending derivatives of regular expressions. First, we formalize the semantics of REwLA. An REwLA has information about the rest of the input, so the definition of the semantics of REwLA is not languages but structures different from those of regular expressions. Thus, we introduce languages with lookahead as sets of pairs of strings with several operations and define the semantics of REwLA as languages with lookahead. Next, we define two kinds of left quotient for languages with lookahead and give corresponding derivatives. Then, we show that the types of expressions obtained by repeatedly applying derivatives are finite under some equivalence relation and give a conversion to deterministic finite automata. We also show that the semantics of REwLA is a finite union of sets of the form A * B, where A and B are regular languages.

[Reasons for the award]

This paper introduces the notion of derivatives for regular expressions extended by lookahead and investigates their properties. The derivative for a regular expression is a language following a given prefix and is mainly used for transformations to a corresponding finite automaton. The authors not only define this notion of derivative for regular expressions with lookahead and apply it to a new transformation to a finite automaton but also carefully discuss its algebraic properties. Specifically, the authors use a set of pairs of strings rather than a set of strings as the semantics of a language with lookahead. It is very interesting that they show that the semantics of a regular expression with lookahead can be expressed in terms of a finite union of pairs of regular languages. For these reasons, this paper is worthy of the Best Paper Award, and we hereby nominate it for the award.

Takayuki Miyazaki

Takayuki Miyazaki received his M.Sc. and Ph.D. degrees from the Tokyo Institute of Technology in 2020 and 2023, respectively. His research interests include formal language theory.

Yasuhiko Minamide

Yasuhiko Minamide received his M.Sc. and Ph.D. degrees from Kyoto University in 1993 and 1997, respectively. Since 2015, he has been a professor at the De partment of Mathematical and Computing Science, Tokyo Institute of Technology. His research interests focus on software verification and programming languages. He is also interested in the theory and applications of automata and formal languages. He is a member of ACM, IPSJ, and JSSST.