Sniffen Packets

With a name like Sniffen, it's got to smell good.

Confusing ARC delays in Rust

Last month, I wrote a little about using a toy problem to explore Python’s async/await concurrency model. I’ve been meaning to learn Rust for ages, and this seemed a nice size to experiment with. I ported over that program to Rust—and the whole thing is up at Github for your enjoyment.

But I’ve run into an interesting problem: the version using the standard library’s ARC/Mutex concurrency is incredibly slow. A rewrite using the crossbeam library’s atomic data types is much faster—despite that they should be using the same primitives. Not only is the ARC/Mutex version slow, but it gets much slower over time—as if something’s leaking badly. I welcome insight to what I could be missing. After all, this is my very first Rust program; perhaps I’m using ARC in a very improper way.

Let’s look at some examples of how the standard ARC/Mutex code differs from crossbeam.

initialization

Initialization is pretty similar between them. The ARC/Mutex version imports some standard libraries, and sets up the struct that will be shared between the computer and inspector threads:

// ARC/Mutex version

use std::f64::consts::PI;
use std::sync::{Arc, Mutex};
use std::thread;
use std::time::{Duration, SystemTime};

const TARGET: u32 = 10;
const GUESS: u32 = 100;

struct Leibniz {
    x: f64,
    d: f64,
    ticks: u32,
    tocks: u32,
}

And the crossbeam version does similarly—but can’t have a central struct:

// crossbeam version

extern crate crossbeam;

use crossbeam::{atomic::AtomicCell, thread};
use std::f64::consts::PI;
use std::thread as sthread;
use std::time::{Duration, SystemTime};

const TARGET: u32 = 10;
const GUESS: u32 = 2500000;

computer

The ARC/Mutex computer is as simple as you could ask for:

// ARC/Mutex version

fn computer(state: Arc<Mutex<Leibniz>>) {
    loop {
        let mut s = state.lock().unwrap();
        for _ in 1..s.ticks {
            s.x += 1.0 / s.d;
            s.d += 2.0;
            s.x -= 1.0 / s.d;
            s.d += 2.0;
        }
        s.tocks += 1;
    }
}

while the crossbeam computer has to mess around with loads and stores more:

// crossbeam version

fn computer(
    xr: &AtomicCell<f64>,
    dr: &AtomicCell<f64>,
    ticks: &AtomicCell<u32>,
    tocks: &AtomicCell<u32>,
) {
    loop {
        let mut x = xr.load();
        let mut d = dr.load();
        let ticks = ticks.load();
        for _ in 1..ticks {
            x += 1.0 / d;
            d += 2.0;
            x -= 1.0 / d;
            d += 2.0;
        }
        tocks.fetch_add(1);
        xr.store(x);
        dr.store(d);
    }
}

inspector

The same is true for the inspector. The ARC/Mutex version is clean to write:

// ARC/Mutex version

fn inspector(state: Arc<Mutex<Leibniz>>) {
    let mut old_d = 1.0;
    let now = SystemTime::now();
    loop {
        thread::sleep(Duration::from_secs(1));
        let mut s = state.lock().unwrap();
        if s.tocks <= TARGET {
            s.ticks /= 2;
        } else if s.tocks > TARGET {
            s.ticks = s.ticks + s.ticks / 10;
        }
        println!("{:?} {} {} {} {}", now.elapsed().unwrap(), s.ticks, s.tocks, s.d - old_d, PI - 4.0 * s.x);
        old_d = s.d;
        s.tocks = 0;
    }
}

while the crossbeam version has a third of its lines on loads or stores:

// crossbeam version

fn inspector(
    xr: &AtomicCell<f64>,
    dr: &AtomicCell<f64>,
    ticksr: &AtomicCell<u32>,
    tocksr: &AtomicCell<u32>,
) {
    let mut old_d = 1.0;
    let mut now = SystemTime::now();
    loop {
        sthread::sleep(Duration::from_secs(1));
        let x = xr.load();
        let d = dr.load();
        let tocks = tocksr.load();
        let ticks = ticksr.load();
        if tocks <= TARGET {
            ticksr.store(ticks / 2);
        } else if tocks > TARGET {
            ticksr.store(ticks + ticks / 10);
        }
        println!("{:?} {} {} {} {}", now.elapsed().unwrap(), ticks, tocks, d - old_d, PI - 4.0 * x);
        tocksr.store(0);
    }
}

setup

The last bit of setup is different: ARC/Mutex requires us to clone the state ARC, and move one copy into a new thread:

// ARC/Mutex version

fn main() {
    println!("ARC std version");
    let state = Arc::new(Mutex::new(Leibniz {
        x: 0.0,
        d: 1.0,
        ticks: GUESS,
        tocks: 0,
    }));
    let state_i = state.clone();
    thread::spawn(move || computer(state));
    inspector(state_i);
}

while crossbeam has us use scoped threads, borrowing each of the atomic values read-only (though of course they can be stored):

// crossbeam version

fn main() {
    println!("Atomic crossbeam version");
    let x = AtomicCell::new(0.0);
    let d = AtomicCell::new(1.0);
    let ticks = AtomicCell::new(GUESS);
    let tocks = AtomicCell::new(0);

    thread::scope(|s| {
        s.spawn(|_| {
            computer(&x, &d, &ticks, &tocks);
        });
        inspector(&x, &d, &ticks, &tocks);
    })
    .unwrap();
}

Having to thread separate AtomicCellvalues through everywhere is a pain—and it’s not much relieved by putting them in a struct, because you still need extra load and stores everywhere you access them. I’m consistently worried that I’ll get the ordering between atomic values wrong and cause either a deadlock or a logic error—even though these are almost one-way channels (tocks is incremented by computer and reset to 0 by inspector).

I’ve adjusted everything I can find in the ARC/Mutex version, including the initial and target constants. I am pretty sure I’m not supposed to be taking a reference to the ARC, but a clone. I’d love to find a set of profiling tools that would let me see what the ARC/Mutex version is spending its time on. I’d love to have advice from a competent Rustacean on how this code could or should look. Thanks!

tech

A demonstration of asyncio in Python 3.6

This is an experiment in cooperating corountines to converge on timing behavior. This is my first program using coroutines in Python; be kind.

The idea is to have a process doing some hard computational work, but about which we want regular progress updates. So we write the computational process as the usual Leibniz process for approximating Pi,

\[ 1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \frac{1}{9} - \cdots = \frac{\pi}{4}, \]

and then write a separate coroutine to watch it and print out some running statistics. Because this is a tiny demo, they’ll communicate through global1 variables. This “inspector” coroutine can wake up once per second to print out current progress. But since asyncio uses cooperative multitasking, we have a hard question: how often should the “computer” process pause?

We could have the computer pause every time through the loop, but that removes a bunch of performance—it means you can’t have a tight numeric loop. It’s much better to plan a few hundred thousand iterations, do those, and then check to see if there’s other work. If we plan a whole second’s worth of work, we might come in just under the time for the inspector to run, then plan another second of work—so the inspector might end up running only every two seconds or worse.

Instead, we should play to pause the computer about ten times every second. To do that, we build a little controller: it plans to run through ticks iterations of the tight computation loop. Every time it does so, it pauses for other work, and increments tocks. When it’s been a whole second, the inspector can compare tocks to target and plan how many ticks to run next time.

I’m told that this gets easier in Python 3.7, but so far this does seem to work. The Pythonista environment is a little weird—one long-running backend interpreter—so closing the event loop can get you in trouble.

This prints one line per second, more or less. The last value on each line is an approximation of pi. It converges pretty quickly! The three values before that show the way the coroutines cooperate to converge on timing, which is the real point of what I’m exploring here. They are: 1) ticks: how many times did the inner loop of the computation process execute per yield? 2) tocks: how many times did the inner loop of the computation process yield per second? 3) d: how big have the parameters of the Leibniz process gotten? You can see performance collapse when Python switches math backends. I was worried it would get near the limits of 32-bit integers, but we’re nowhere close.

On my iPad, the system sawtooths from 16 to 10 tocks per line printed, and the lines continue to come at about 1 Hz. If I set the target to 1, of course, the lines get printed at < 0.5 Hz.

I’m not quite sure what’s going on in a few parts of this:

  • What would change if I made computer an async def and not a coroutine, and then used await asyncio.sleep(0) or similar instead of yield? I tried it and saw no performance difference. But what’s the change in semantics between asyncs and coroutines?

  • What’s a reasonable way to kill off threads after an exception has interrupted your event loop? Everything I’ve come up with leaves a thread that’s been killed by signal (KeyboardInterrupt) with its exception unread. I’ve tried canceling them, then scheduling a 0.2s pause. I’ve tried set_done(). I’ve tried closing the whole event loop and making a new one. All of those produce an exception. I’ve even tried canceling them, then run_until_complete each of them—but that runs forever.

Anyway, the coroutine paradigm is beautiful and I look forward to using it more. That last bit about killing threads seems unique to the Pythonista environment; in most places that’ll either interrupt the program, or you’ll intend to resume the event loop & may have even handled the exception inside it.

import asyncio

limit = 10 ** -6

# starting conditions for Leibniz's approximation of pi
x = 0
d = 1

# a starting guess at how many runs of the computer() inner loop
# will work out to `target` yields per second
ticks=100000

# how many yields actually happened?
tocks=0

# how many yields should computer() do per run of inspector()?
# that is, per second?
target=10


async def inspector():
	global ticks,tocks
	while True:
		await asyncio.sleep(1)
		if tocks<=target:
			ticks = ticks / 2
		elif tocks==target:
			pass
		else:
			ticks = int(ticks * 1.1)
		print(ticks,tocks,d,4*x)
		tocks=0

## TODO: set a target number of digits, and when that's stable, exit cleanly.

@asyncio.coroutine
def computer():
	global d,x,ticks,tocks
	clock=0
	while True:
		x += 1/d
		d += 2
		x -= 1/d
		d += 2
		if clock > ticks:
			tocks += 1
			clock=0
			yield
		else:
			clock+=1

async def cleanup():
	await asyncio.sleep(0.2)

# This is an excessive amount of work on cleanup.  It's a mix of an attempt to be
# careful, cancelling exactly those tasks that need to go---this didn't
# work---and a simple process of making a new event loop and closing off
# the old one.

# I'd love to understand more about where these "task exception was never
# retrieved" errors come from, and how to run the task long enough to process
# the exception.

async def main():
	try:
		c=computer()
		i=inspector()
		await asyncio.gather(c,i)
	finally:
		#c.cancel()
		#i.cancel()
		await cleanup()

asyncio.set_event_loop(asyncio.new_event_loop())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

Incidentally, this post was written on an iPad: the program was written and tested in Pythonista, the post was edited with Textastic and manged through the git client Working Copy, and the server and static site generator were manipulated with Prompt.


  1. If this offends you, you may refer to them as process-local variables.↩︎

tech

Emailing graphics from Emacs

I’ve used Emacs as my mail client since the 1990s—first with Gnus, now with Notmuch. One of the features that most keeps me there is message-mode: it’s the best way I know to write, edit, and create words for others to read. But for all its excellence, it’s a tool from the 1980s. It assumes a world of plain text email. That’s often fine for a techie—most of my world is plain text email.

Emacs message-mode still thinks of ideas like attachments as a feature worth mentioning. But I work with people who treat not only attachments, but rich HTML text, tables, and graphics as an essential part of communication. I see the benefits of those too—especially in a world where orders of magnitude people use smartphones for e-mail than use terminal windows with fixed-width fonts! I’ve been using a little script called mimedown for ages. I originally started with a 2008 blog post:

(defun mimedown ()
  (interactive)
  (save-excursion
    (message-goto-body)
    (let* ((sig-point (save-excursion (message-goto-signature) (forward-line -1) (point)))
           (orig-txt (buffer-substring-no-properties (point) sig-point)))
      (shell-command-on-region (point) sig-point "Markdown.pl" nil t)
      (insert "<#multipart type=alternative>\n")
      (insert orig-txt)
      (insert "<#part type=text/html>\n< html>\n< head>\n< title> HTML version of email</title>\n</head>\n< body>")
      (exchange-point-and-mark)
      (insert "\n</body>\n</html>\n<#/multipart>\n"))))

But who uses Markdown.pl these days? So that got changed pretty fast to:

(defun mimedown ()
  (interactive)
  (save-excursion
    (message-goto-body)
    (mml-unsecure-message)
    (let* ((sig-point (save-excursion (message-goto-signature) (forward-line -1) (point)))
           (orig-txt (buffer-substring-no-properties (point) sig-point)))
      (shell-command-on-region (point) sig-point "/usr/local/bin/pandoc --from markdown --to html --smart --standalone" nil t)
      (insert "<#multipart type=alternative>\n")
      (insert orig-txt)
      (insert "<#part type=text/html>\n")
      (exchange-point-and-mark)
      (insert "\n<#/multipart>\n"))))

That works with the lovely table support and footnotes available in Pandoc. But now I’ve learned about the Lua filters available in recent Pandoc—and I can’t count the number of times I wish I could show a diagram to a colleague, but can’t rely on ASCII art to get there. Therefore:

(defun mimedown ()
    "For writing pretty mail"
    (interactive)
    (save-excursion
      (message-goto-body)
      (mml-unsecure-message)
      (let* ((sig-point (save-excursion (message-goto-signature) (forward-line -1) (point)))
             (orig-txt (buffer-substring-no-properties (point) sig-point)))
        (shell-command-on-region (point) sig-point "/Users/bts/bin/pandoc --from markdown+smart --to html --self-contained --lua-filter=diagram-generator.lua --metadata pagetitle=Mail --metadata plantumlPath=/usr/local/Cellar/plantuml/1.2019.9/libexec/plantuml.jar --template mimedown.html" nil t "*mimedown-errors*")
        (insert "<#multipart type=alternative>\n")
        (insert orig-txt)
        (insert "<#part type=text/html>\n")
        (exchange-point-and-mark)
        (insert "\n<#/multipart>\n"))))

Now I can write mail that uses sequence diagrams, ditaa, graphviz, tikz, even matplotlib and has the images both properly processed and included directly—as data URLs, not attachments—so they forward reliably. Even better, the text/plain version is crystal clear (and the preferred form for modifying the diagram!). Here, take a look:

I write

```{.plantuml caption="This is an image, created by **PlantUML**."}
@startuml
Alice -> Bob: Authentication Request
Bob --> Alice: Authentication Response
Alice -> Bob: Another authentication Request
Alice <-- Bob: another Response
@enduml
```

The text/plain version shows that directly. And the text/html part says:

<img
src="..." alt  />

But shows up to most people as:

This is so cool to me—look at the source of that image!

The hard part is getting Java/PlantUML, Graphviz, Tikz, Inkscape, Python, and all installed and accessible to each other. To work with a Mac and Homebrew, I have some modifications at https://github.com/briansniffen/lua-filters ; it’s just a few lines to deal with Inkscape having a non-FHS app bundle. The only other detail is the trivial template to throw away the HTML header, ~/.pandoc/templates/mimedown.html:

$for(include-before)$
$include-before$
$endfor$
$body$
$for(include-after)$
$include-after$
$endfor$

I’m happy to chat about how to make this work well for you, and I look forward to getting more beautiful—but still usable—mail!

tech

A systems argument for the Electoral College

In casual conversation about politics, I see a growing assumption that all Coastal Liberals will reject the Electoral College: after all, our candidates win the popular vote. The “Hamilton Elector” attempts in 2016 frightened us. We remember Civics teachers telling us that the EC is there to prevent a demagogue from taking over the country. But we’re actually more scared about a demagogue using the EC, by influencing the smaller states to put a reactionary minority in charge.

I value the Electoral College as an anti-fragile element of our Republic’s design. This note is to explain why, and then to use that light to illuminate new angles on the standard arguments for and against the Electoral College. What are the important features of anti-fragility?

  • Simple rules
  • Decentralized evaluation
  • Failure-damping layering
  • Convey local failure signals up—drive adaptation
  • Experience local failure locally
  • Redundancy and overcompensation
  • Trust local operators

Why have elections?

It is widely understood that the purpose of an election is to pick a winner—to pick someone to occupy an office. This is not so. Were it so, we have many easier ways to do so. The key contribution of an election is that it persuades everyone who is not the winner to sit down: that they have lost and are not in power. After an election, everyone—most especially the losing candidates and their supporters—agree on the victor. This is called legitimacy. Legitimacy prevents uprisings and civil wars.

Therefore, the most important threat to worry about to an election—the loss we clearly cannot accept—is a lack of legitimacy. We lose legitimacy when people believe that fraud has happened.

What else can go wrong? Someone could actually steal the election—there could be fraud that affects an outcome. Let’s focus most on the Presidency of the United States, the most powerful elected office in the world. Lots of people in the United States and outside it want to affect who has that office. But those inside the US don’t want to risk that legitimacy—it’s no good having the office if everyone knows you stole it! They’re only going to commit certain sorts of frauds. For some of those outside the country, breaking the legitimacy of our elections is a win all on its own.

How do our elections work?

Most of our states conduct an internal uniform popular vote. It’s organized by districts, but each district reports up a total number of votes for each candidate, and a total number of votes cast (as an error check). These totals are summed. All the electors go to whoever gets the most votes. Ties are handled comically: by a hand of poker, a coin flip, or letting some distinguished person choose. If the election fails to return a result, it is rescheduled and held again at moderate expense.

Some of our states vary a bit: they give some electors to the winner of the state popular vote, but distribute others to other candidates in proportion to the number of votes they received. Maine is notable here—but as it has only 3 electors, and leans strongly Democratic, one vote is up for grabs: the Democrats have a lock on two, and the Republicans might win one with an intensive campaign. This reduces national incentive to direct campaign resources towards Maine.

A few states have discussed assigning two electors to the winner of the state popular vote, and one elector to the winner of a vote in the heavily gerrymandered congressional districts. This is a terrible idea for all the reasons gerrymandering is terrible.

Evaluating the Electoral College

Against the criteria above:

The rules are simple: we can conduct an election in 2020 with pencil and paper carried on horseback. We’d rather not—but it would basically work. Unlike a Roberts’ Rules system, we don’t need a parliamentarian to interpret the rules and help us understand their interactions.

Elections are operated principally by volunteer retirees coordinated at the level of school districts. They handle questions like “can this person vote” with only delayed and sparse review. States manage their own machine and process selection in a nicely decentralized way.

The layers damp failures: a fraud in Oklahoma doesn’t affect the total, because we know Oklahoma’s electors will be Republican. Moreover, someone who does improperly win an election has little opportunity to turn that one win into others—you can’t gerrymander state borders.

But the failure is both signalled upwards by news media reporting, and experienced locally in terms of political fallout. Here in Massachusetts, William Galvin has been repeatedly re-elected Secretary of State running principally on a record of competence and integrity. States whose elections are poorly run replace those responsible quickly—even when run by governors with otherwise awful policies.

We use the on-the-ground system for not only Presidential elections, but for dog catchers and governors and the critical school board. It can support a primary in the summer of a year, a Presidential election that November, and a recall election that February. That part has moderate redundancy; we could do better, and if we ever have an illegitimate election we’ll have to do better. I suspect that will involve pencils and paper, then phone calls and hand delivery. Then and now, we’ll trust those local retirees to handle local problems: if a Russian agent shows up to mess with the election, they’re going to call local police, deal with the problem, and the rest of us will find out only much later.

What if there is election fraud?

In most years, very few states are close (see Wikipedia’s list of records). Minnesota will always vote Democratic. Utah will always vote Republican. In years where that’s not true, it’s also not close the other way—in 1964 and in 1984, a candidate won nearly every state.

A fraudster who wishes to steal an election has no reason to try to commit fraud in Minnesota or Utah: they’d need to create very many fraudulent ballots—and so are more likely to get caught—and assign them to many districts—and still nobody would look at exit polls and believe this outcome. Because of this, most states have no need for extraordinary election security. They have to worry about local actors committing local frauds in the race for governor or for school board—but we know who their electors are going to be.

If we remove the Electoral College, every state will have to defend against fraud in the same way that Ohio and Pennsylvania will now. Both parties will have to ramp up defenses and observations—and based on those observations, we should expect to see Bush v Gore legislation in tens of states, every time. What plagued Florida will visit any state where some local municipality used poor practices, because that municipality is contributing a key ten thousand votes to the totals.

Maybe it’s undemocratic to have representative electors rather than directly elect the President. But the electors do a pretty good job of modeling the outcome of a direct election. Illiterate news organizations run stories about who won the popular vote every four years—but we have not conducted a popular vote. News organizations saying we have badly misunderstand the facts on which they’re reporting. The best part is that these stories run right next to stories about low voter turnout, especially in states with predictable results.

In Massachusetts, most people are registered Democrats. Some are registered Republicans, and some are not enrolled. An inconsequential fraction are registered Green or Libertarian. The Green and Libertarian parties do not have meaningful primaries—at most one candidate runs—and do not affect election outcomes. Both the Republicans and the Democrats have incentive to vote in their party Primary. The Republicans can nudge the consensus of the party, and their delegates have been very effective at floor negotiations. The Democrats have a real say in who’s going to be on the ballot.

But when it’s time for the real election in November, both Republicans and Democrats know what the outcome is ahead of time. They can choose to stay home—and if the Governorship and both Senators are predictable, they overwhelmingly do stay home. That’s a few hundred thousand Republicans and millions of Democrats. The reverse is true in Oklahoma or South Dakota.

Across the country, tens of millions of people stay home who would plausibly come out to vote if their vote really mattered. In a real popular election, we should expect to see turnouts double our ordinary turnout of the last half century. When we try that experiment, then we can talk about who won the popular vote.

Until then, part of the duty of every citizen is to understand the arithmetic here and not mistakenly damage the legitimacy of our election process.

A modest proposal

On all the bases above, we should keep the Electoral College for the Presidency. We should strongly consider using it for state-wide offices, starting with the Governor and the Senators, using existing Congressional districts.

We might add a layer of hierarchy: states vote by internal ECs, one per congressional district. In Wyoming (1 rep), nothing changes. But now we have significant extra insulation against overvoting fraud.

As part of this bargain, of course, we would have to fix district boundary calculation to avoid gerrymandering. My hope is that Republicans want better election of state-level offices, and all statesmen want to fix gerrymandering—but the Republicans think it’s more expensive than the Democrats do, so need a feature they can take home.

politics

Juvenalia: Upright

I’m trying out some of the practices of Marie Kondō. It’s not all fitting perfectly into my life, but it is helping me get through piles of accumulated “someday I’ll need.” That was so congested that I could find nothing in it, review nothing—well, we’ve been making progress. 95% of it has left the house. Yay. And the 5% that I’m really glad to find have turned up. From that, anything I can digitize I am digitizing.

One delight is a memory of a theatre class at nerd camp from 1991. I remember none of the surnames of the people involved here. The lead was Noah, RJ was probably an advisor, and I think I played Peter the daffy psychic. But the poem stuck with me for the last 27 years, and perhaps someone else will find it by looking for this text:

The Upright Man stands straight and proud
He knows his mind and speaks out loud
His iron will, not tarnished or stained
But iron rusted the day it rained
His noble thoughts fall on deaf ears
Again he speaks, though no one hears.
Confronted by indifference, he knocks on other doors
And should a soul be listening, it draws back and ignores
He preaches words of honesty, of bravery, and fire
But finds himself a cast-away upon a sea of ire
Yet once again he shouts and screams, as angry storm clouds grow
Then lies there, bloody, broken, tamed, the Upright Man layed low
The man now lies beneath our feet
Trodden on, trampled, and finally beat
His iron soul a leaded weight
That sank him in a sea of hate
Rusted to his very core
The Upright Man, upright no more

You can have a nicely typeset copy of Upright in PDF or LaTeX, or the partial scan. And if you were involved, get in touch!

art