Daniel Barnes

a blog
Recent Posts View tags Contact me

Tag 'Computer Science':

Heartbleed, and What's Stored in RAM?

I'm in a class which discusses computer architecture. Here's a post that I posted to the form which talks about the importance of preventing your computer from "catching on fire", reading information from RAM that it should not be reading.

A cool/scary real-world example I wanted to share which illustrates a memory misuse error is a vulnerability in a popular encryption library, OpenSSL, called “Heartbleed”.

Heartbleed illustrates an example where C code is tricked and told to read data from memory beyond the allocated memory block, and thus begins to spit out data from memory (“undefined behavior” in C). Basically, this means C was told to allocate some space in memory for some data, and then when the user asks for the data in a specific way, there’s a case where the software will spit out the data that was asked for, plus some excess data read from slots in memory beyond what was allocated. The result is that the software spits out "raw" data from RAM that it should not be accessing on purpose. (This issue was since patched, as seen in the link).

This data in memory can be anything from dereferenced garbage (since free() does not “scrub” data and delete the contents of its memory block) to data that might still be important and in use, such as passwords, private keys, or otherwise vital data which should not be retrieved by unauthorized users.

This can also be seen as a cool illustration of how data is only data in the context of how it is interpreted. Much of the data stored in RAM can be interpreted as total garbage in some contexts, and information in other contexts. If the block of data you retrieved was in the middle of an image, which you tried to read as UTF-8 encoded text, it might look like:

hIJ?.N?q??ʲ????-j??Q?F?r?/\`g?^????S?6?4}B?=b?

ix'L??B??˞?t??  ???ocxբl???/52?Ï?IJ?'?_t???_???Wdό?<DZ,:??sX??Ws?/%??~???~Ts?\??5???e?|m4?mռ?vc?C??F?Ƽ?U??E??`??"w?m?d?(??Q޺??v??ũ]nj?1??l?l??1#?;?&W???????+??SBnK?????-?͖VFnZ6lc?t?^k?Dmk????$W?oe2?(??\ܒpB???????7??N???i%?????)om,d?9

Or, if part of the data was a string stored in RAM, you might find something more meaningful to the human eye when encoding to UTF-8:

?d5#KY???????P??XPg? ?ˬ, op??0??C??=?+I(??ε??^??=?:????[?73??M?i??r?]?_??%?U?M]

?b??q?GSU??/A????p??LE~LkP?A??tb                      ?!.t?<

                               ?A?3???0X?Z2?h???es?g^e?I?v?3e?w??-??z0?v0U?0U?0?0U+?iG?v ??k?.@??G^0U#0?+?iG?v ??k?.@??G^0?U 0?0? *?H??cd0??0+https://www.apple.com/appleca/0?+0????Reliance on this certificate by any party assumes acceptance of the then applicable standard terms and conditions of use, certificate pol?\6?L-x?팛??w??v?w0O????=G7?@?,Ա?ؾ?s???d?yO4آ>?

Given the multitude of ways data is stored on a computer, most information read directly from memory will look like random gibberish if you try to read it as text (because chances are it's probably not text, it could be code instructions or images or objects or numbers or structures...).

I thought this was just a fun example of why memory misuse errors can be particularly bad (revealing passwords or other private data) and what it means when memory only means something in a specific context (since the ones and zeros in memory can be interpreted in a multitude of ways, and should only really be interpreted in the context they were created in/should be used in).


By Daniel, on November 4, 2018, 4:27 pm

Contact Me

I am available for hire as a freelance web developer/software engineer.

My most thorough experience is in PHP and Python web development, and my favorite projects have focused on managing large sets of data.

Examples of my work include:

If you'd like to get in touch, I can be reached via e-mail at: daniel -at- danielbarnes -dot- me.


By Daniel, on July 19, 2018, 12:42 pm

Regular Expression building based on user input

Sometimes we need to depend on the user for some variable format to interpret, and we need them to be able to configure that in our program. However, it's unreasonable to expect somebody to give you a full-blown regular expression with matching groups that you can interpret.

I've come up with a small solution which builds a regular expression based around capturing groups. It's very basic at the moment, with some features missing (such as the ability to have a literal \\Q, \\E, or curly bracket, or having angle brackets in capture groups) -- but for basic uses where they might give you data in a standard but arbitrary manner, this is a solution which allows that to be configured.

So, for example, if somebody has a bunch of files on their system which are named similarly to "Look what you made me do - Taylor Swift.mp3", this program allows those users to specify where the details in that filename are: {title} - {artist}.mp3.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FormatInterpreter {

    private Pattern regex;

    public FormatInterpreter(String format){
        // formats look like this:
        // "{lastname}, {firstname} lives at {number} {street} in {city}, {state} {zip}"
        // this is a user-readable format that may be entered into a config file
        // we convert this into a regex in order to return this data.
        StringBuilder regex = new StringBuilder("^");
        char[] chars = format.toCharArray();
        int i = 0;
        while(i < chars.length){
            if(chars[i] == '{'){
                StringBuilder capture = new StringBuilder();
                try {
                    while (chars[++i] != '}') {
                        capture.append(chars[i]);
                    }
                } catch(IndexOutOfBoundsException ex){
                    throw new IllegalArgumentException("Formatting string was malformatted. (Unbalanced curly brackets).");
                }
                regex.append("(?<" + capture.toString() + ">.*)");
                i++;
            } else {
                regex.append("\\Q");
                while(i < chars.length && chars[i] != '{'){
                    regex.append(chars[i++]);
                }
                regex.append("\\E");
            }
        }
        regex.append("$");
        this.regex = Pattern.compile(regex.toString());
    }

    public FormatInterpretation read(String s){
        return new FormatInterpretation(regex.matcher(s));
    }

    public static void main(String[] args){
        FormatInterpreter f = new FormatInterpreter("{lastname}, {firstname} lives at {address} in {city}, {state} {zip}");
        System.out.println(f.regex.toString());
        FormatInterpretation fi = f.read("Barnes, Daniel lives at 665 Candyland Dr. in Basalt, CO 81621");
        System.out.print(fi.matched());
        if(fi.matched()) {
            System.out.println(": " + fi.get("firstname") + " in " + fi.get("city"));
        }
    }
}

class FormatInterpretation {

    private Matcher matcher;

    public FormatInterpretation(Matcher matcher){
        this.matcher = matcher;
    }

    public String get(String s){
        return matcher.group(s);
    }

    public boolean matched(){
        return matcher.matches();
    }
}

We build a regular expression which ends up looking something like:

^(?<lastname>.*)\Q, \E(?<firstname>.*)\Q lives at \E(?<address>.*)\Q in \E(?<city>.*)\Q, \E(?<state>.*)\Q \E(?<zip>.*)$

Notice that inside the regex there is the use of .* -- this opens up the risk of having spots where if there are fields containing their separators:

{number} {street} => 123 Candyland Dr.

This runs the risk of {number} containing either 123 Candyland or 123 (depending on greedy settings).

However, everything is anchored, and if you have several fields and unique separators not also used within the capturing groups:

{firstname}&{lastname}|{phone}

This is a great way to allow the user to specify a format and easily use that formatting information as a regex for collecting user data.


By Daniel, on July 17, 2018, 12:05 pm

Regolf, the Regular Expression game!

I think I first learned about regular expressions through xkcd comics.

Meanwhile, I like to spend time on EsperNet (IRC chat), where I am currently an operator under the name nasonfish.

Some people make bots in order to make social games possible over IRC-- for example, I used to like to play an implementation of a game called "Werewolf", the social detective game where the werewolves are trying to eat the people and the people are trying to kill the werewolves, and the only clues you have are the person who gets eaten every night by the wolves. There was a great bot set up which made the game really fun, and with enough people it added enough spice to the game with other roles like the seer, harlot, detective, etc.

So, a few years ago, I made Regolf -- an IRC bot which runs a social regex-golf game.

The bot comes up with a set of words, and players compete to come up with the shortest regular expression to match all the words in one set of words, but none of the words in another set. An example of this is the title-text of xkcd comic 1313:

/bu|[rn]t|[coy]e|[mtg]a|j|iso|n[hl]|[ae]d|lev|sh|[lnd]i|[po]o|ls/ matches the last names of elected US presidents but not their opponents.

The golf part comes in as you try to find the shortest possible solution for a problem (much like code golf, another fun activity!)

When you trigger the bot, it comes up with something like this:

[22:49:51] <@nasonfish> !start
[22:49:52] <regolf> Beginning new regex golf game.
[22:49:52] <regolf> Please match: Wheaties, cellulars, zippers, overseers, misrepresented, mindlessness, newsletters
[22:49:52] <regolf> Do not match: Dagwood, Weinberg, chairperson, hookworm, mummer, Fatimid, Bernini
[22:49:52] <regolf> You have 105 seconds; Private message me your regular expression(s) using /msg regolf expression!

The user would message the bot something like:

[22:52:16] <nasonfish> V|T|D|zz|ps|nn
[22:52:16] -regolf- V|T|D|zz|ps|nn (11/6/6): Positive: Velveeta, leprechauns, Triangulum, Diaghilev, fuzziness, sups, gunrunners | Negative: imprint, scrolled, deform, encapsulations, Algerian, unit, saxophonists

Once time is up, the user may no longer edit their regex, and points are awarded on accuracy and length of expression. Once you reach a certain amount of points, you win the game, so it's a race to consistently come up with the best regexes.

This bot has been out of service for quite a while, but on a whim, I brought it back on. So, feel free to join me in #regolf on the IRC network EsperNet for a game of regex golf sometime!


By Daniel, on June 15, 2018, 11:11 pm