Freitag, 27.04.2018, 16ct, V3-204: Jens Stoye (Bielefeld University)
On Indexing for Jumbled Pattern Matching
Jumbled pattern matching is a variant of string matching where the order
of characters in the matched region is ignored. It has applications in
various areas of bioinformatics and beyond. Like in ordinary string
matching, indexing allows to speed-up the search of multiple patterns in
the same text.
Recently, the problem of indexing binary strings for jumbled pattern
matching has received quite some attention. While it seems unlikely to
develop faster algorithms in terms of input size, we can still make
progress on the problem by considering other natural parameters. One
such parameter is the number of runs of 1s in the input, i.e. the length
of the run length encoding (RLE) of the input string.
We present a new and very simple algorithm matching the best known
running times. This algorithm can be extended, either so that the index
returns the position of a match (if there is one), or so that the
algorithm uses only $n$ bits of space instead of $n$ words.
To better understand the behaviour of the latter variant, we also study
some combinatorial problems arising from comparing the RLE of the input
string to the RLE of the binary output string. Some of these problems
are still open, and any idea from the audience will be highly appreciated.
This is joint work with Luis Kowada, Luís Felipe Cunha, Roland Wittler,
Travis Gagie, Szymon Grabowski and maybe others.