Sniffen Packets

Some time in 2011, I built a little toy program called Zampolit. Its job is to support stories about who’s contributing to a shared project, and how the team is making progress towards a shared goal. I learned a bunch of my collaborative writing habits from the MIT Assassins’ Guild, a group of LARPers. A typical 60-person 10-day game is about half a million words, produced by a team of about five people over the course of about a year. Because most of the teams producing these works are producing their first such work, most teams dramatically mis-estimate their future rate of production and their chances of producing in future differently than they have in the recent past. But because they’re MIT LARPers, they produce their work using an object-oriented variant of TeX¹ and coordinate using git.

The people who make the schedule need to know whether a game will actually be ready. They could ask the team, but the team’s probably wrong—and has identity-based reasons to support their own confusion. The schedulers therefore appoint a zampolit, named for the Soviet political officers of the later twentieth century. The zampolit is responsible for learning as much about the game as one of its authors, and then for telling the schedulers honestly how it’s going. We do have stories of zampolits “going native” and joining the group delusion of the authors that the work will be done on time, of course. So there are a handful of reasons for tools to support their work:

To help the zampolit be honest with themselves;
To help clearly communicate to the team, so they can understand an honest assessment of progress;
To show who’s doing the work, and who will have to change how much to make a different contribution;
To inform the schedulers about not only whether the team will likely make its deadlines, but how plausible that is.

What could be better for this than a detailed graph of all the existing contributions? The old Zampolit was an attempt to provide this.. but if you look at the code, you’ll see that it’s really weird. It’s not just that it’s in Haskell. It’s using Haskell as a variant shell script using the HSH library. Is it enough to say that the author of HSH maintains the largest remaining Gopher Hole on the Internet? HSH doesn’t build cleanly with any modern version of Haskell. The old Zampolit is also slow: it iterates through the git history of a project, checking out every revision—so it has to interact with the file system a lot.

I’ve reconsidered how to approach a program like this in the last ten years, so today I rewrote Zampolit to run in a fraction of the time—through not touching the file system—and in less than a hundred lines of mixed Shell, Awk, and GNUplot. I haven’t yet figured out a great way to distribute this, but for now you can just unpack a tar file into your git repository and then commit & track it along with your work; maybe some day I’ll work out a way to ship this better.

For now, let’s look at the program. The top level is a shell program. It figures out where we’re working, then produces a list of every commit. For each commit, it emits the author email, timestamp, commit hash (not used thereafter, but helpful for debugging), and size of the word diff.

MYDIR=`git rev-parse --show-toplevel`/zampolit

git rev-list --pretty="%ae %at %H" master \
    | grep -v "^commit" \
    | while read a t c; do
    echo -n "$a $t $c ";
    git show -p --word-diff=porcelain $c \
        | grep -e '^[-+][^-+]' \
        | wc -w; \
        done \
          >| $MYDIR/zampolit.data

gnuplot -e "mydir='$MYDIR'" $MYDIR/zampolit.gnuplot && rm $MYDIR/zampolit.data && evince zampolit.pdf

Then it invokes GNUplot. GNUplot reads the data file twice, each time using an Awk script to interpret it. This version smooths the data with smooth cumulative, but there’s no reason you couldn’t do away with that. The noenhanced bit is necessary to keep GNUplot from mis-interpreting @ in email addresses as some sort of negative space character—GNUplot is weird software too!

set datafile separator ","
stats '<awk -f '.mydir.'/zampolit.awk '.mydir.'/zampolit.data' using 1 nooutput


set xdata time
set timefmt "%s"
set format x "%m/%d/%Y"
set terminal pdfcairo size 10in,7.5in
set output "zampolit.pdf"
set xlabel "date"
set ylabel "words"
set title "word counts by author"
set datafile missing "?"
set key autotitle columnhead noenhanced
plot for [i=2:STATS_columns] '<awk -f '.mydir.'/zampolit.awk
'.mydir.'/zampolit.data' using 1:i smooth cumulative with linespoints

This Awk program is mostly a gift from Tom Fenech at Stack Overflow. Thanks!

# repeat comma n times
function r(n) { 
    s=""
    for (j = 0; j < n; ++j) 
    s = s ","
    return s 
}

BEGIN {
    header = "time"
}

# add each element to array, indexed on name & timestamp
# drop $3, the hash value
{ a[$1,$2] = $4 }

# register any new names seen in column one
!seen[$1] { 
    seen[$1] = ++c
    header = header "," $1
}

END {
    print header
    for (i in a) {
        # the index $1,$2 from above is separated by 
        # the built-in variable SUBSEP                  
        split(i, b, SUBSEP)
        # now b[1] is the name (car1 or car2)
        # which is used to determine how many commas
        # b[2] is the x value
        # a[i] is the y value
        printf "%s%s%s\n", b[2], r(seen[b[1]]), a[i] 
    }                   
}

Anyway, it looks like all that’s needed now is to drop those files into project/zampolit/, run zampolit.sh, and you should have a quick assessment of who’s working and how. Please let me know if it’s helpful to you, or if some feature would make this better.

object systems seem invented for adventure games, after all↩︎

Sniffen Packets

With a name like Sniffen, it's got to smell good.

Zampolit revisited