Daniel Barnes

a blog
Recent Posts View tags Contact me

Tag 'Computer Science':

Contact Me

I am available for hire as a freelance web developer/software engineer.

My most thorough experience is in PHP and Python web development, and my favorite projects have focused on managing large sets of data.

Examples of my work include:

If you'd like to get in touch, I can be reached via e-mail at: daniel -at- danielbarnes -dot- me.


By Daniel, on July 19, 2018, 12:42 pm

Regular Expression building based on user input

Sometimes we need to depend on the user for some variable format to interpret, and we need them to be able to configure that in our program. However, it's unreasonable to expect somebody to give you a full-blown regular expression with matching groups that you can interpret.

I've come up with a small solution which builds a regular expression based around capturing groups. It's very basic at the moment, with some features missing (such as the ability to have a literal \\Q, \\E, or curly bracket, or having angle brackets in capture groups) -- but for basic uses where they might give you data in a standard but arbitrary manner, this is a solution which allows that to be configured.

So, for example, if somebody has a bunch of files on their system which are named similarly to "Look what you made me do - Taylor Swift.mp3", this program allows those users to specify where the details in that filename are: {title} - {artist}.mp3.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FormatInterpreter {

    private Pattern regex;

    public FormatInterpreter(String format){
        // formats look like this:
        // "{lastname}, {firstname} lives at {number} {street} in {city}, {state} {zip}"
        // this is a user-readable format that may be entered into a config file
        // we convert this into a regex in order to return this data.
        StringBuilder regex = new StringBuilder("^");
        char[] chars = format.toCharArray();
        int i = 0;
        while(i < chars.length){
            if(chars[i] == '{'){
                StringBuilder capture = new StringBuilder();
                try {
                    while (chars[++i] != '}') {
                        capture.append(chars[i]);
                    }
                } catch(IndexOutOfBoundsException ex){
                    throw new IllegalArgumentException("Formatting string was malformatted. (Unbalanced curly brackets).");
                }
                regex.append("(?<" + capture.toString() + ">.*)");
                i++;
            } else {
                regex.append("\\Q");
                while(i < chars.length && chars[i] != '{'){
                    regex.append(chars[i++]);
                }
                regex.append("\\E");
            }
        }
        regex.append("$");
        this.regex = Pattern.compile(regex.toString());
    }

    public FormatInterpretation read(String s){
        return new FormatInterpretation(regex.matcher(s));
    }

    public static void main(String[] args){
        FormatInterpreter f = new FormatInterpreter("{lastname}, {firstname} lives at {address} in {city}, {state} {zip}");
        System.out.println(f.regex.toString());
        FormatInterpretation fi = f.read("Barnes, Daniel lives at 665 Candyland Dr. in Basalt, CO 81621");
        System.out.print(fi.matched());
        if(fi.matched()) {
            System.out.println(": " + fi.get("firstname") + " in " + fi.get("city"));
        }
    }
}

class FormatInterpretation {

    private Matcher matcher;

    public FormatInterpretation(Matcher matcher){
        this.matcher = matcher;
    }

    public String get(String s){
        return matcher.group(s);
    }

    public boolean matched(){
        return matcher.matches();
    }
}

We build a regular expression which ends up looking something like:

^(?<lastname>.*)\Q, \E(?<firstname>.*)\Q lives at \E(?<address>.*)\Q in \E(?<city>.*)\Q, \E(?<state>.*)\Q \E(?<zip>.*)$

Notice that inside the regex there is the use of .* -- this opens up the risk of having spots where if there are fields containing their separators:

{number} {street} => 123 Candyland Dr.

This runs the risk of {number} containing either 123 Candyland or 123 (depending on greedy settings).

However, everything is anchored, and if you have several fields and unique separators not also used within the capturing groups:

{firstname}&{lastname}|{phone}

This is a great way to allow the user to specify a format and easily use that formatting information as a regex for collecting user data.


By Daniel, on July 17, 2018, 12:05 pm

Regolf, the Regular Expression game!

I think I first learned about regular expressions through xkcd comics.

Meanwhile, I like to spend time on EsperNet (IRC chat), where I am currently an operator under the name nasonfish.

Some people make bots in order to make social games possible over IRC-- for example, I used to like to play an implementation of a game called "Werewolf", the social detective game where the werewolves are trying to eat the people and the people are trying to kill the werewolves, and the only clues you have are the person who gets eaten every night by the wolves. There was a great bot set up which made the game really fun, and with enough people it added enough spice to the game with other roles like the seer, harlot, detective, etc.

So, a few years ago, I made Regolf -- an IRC bot which runs a social regex-golf game.

The bot comes up with a set of words, and players compete to come up with the shortest regular expression to match all the words in one set of words, but none of the words in another set. An example of this is the title-text of xkcd comic 1313:

/bu|[rn]t|[coy]e|[mtg]a|j|iso|n[hl]|[ae]d|lev|sh|[lnd]i|[po]o|ls/ matches the last names of elected US presidents but not their opponents.

The golf part comes in as you try to find the shortest possible solution for a problem (much like code golf, another fun activity!)

When you trigger the bot, it comes up with something like this:

[22:49:51] <@nasonfish> !start
[22:49:52] <regolf> Beginning new regex golf game.
[22:49:52] <regolf> Please match: Wheaties, cellulars, zippers, overseers, misrepresented, mindlessness, newsletters
[22:49:52] <regolf> Do not match: Dagwood, Weinberg, chairperson, hookworm, mummer, Fatimid, Bernini
[22:49:52] <regolf> You have 105 seconds; Private message me your regular expression(s) using /msg regolf expression!

The user would message the bot something like:

[22:52:16] <nasonfish> V|T|D|zz|ps|nn
[22:52:16] -regolf- V|T|D|zz|ps|nn (11/6/6): Positive: Velveeta, leprechauns, Triangulum, Diaghilev, fuzziness, sups, gunrunners | Negative: imprint, scrolled, deform, encapsulations, Algerian, unit, saxophonists

Once time is up, the user may no longer edit their regex, and points are awarded on accuracy and length of expression. Once you reach a certain amount of points, you win the game, so it's a race to consistently come up with the best regexes.

This bot has been out of service for quite a while, but on a whim, I brought it back on. So, feel free to join me in #regolf on the IRC network EsperNet for a game of regex golf sometime!


By Daniel, on June 15, 2018, 11:11 pm