Welcome to the Fall
2018 Paul G. Allen Center for Computer Science
and Engineering Colloquium. So we’re rebooting the
colloquium to a model that we used to have
in the department where on non-distinguished
lecturer weeks, we hear about work
going on here at home. And hopefully this will help
as we grow the department, keep that sort of small,
close-knit family kind of feeling. So today to kick
things off, I wanted to introduce folks
who are new or remind folks who’ve been here for a
while about the Programming Languages and Software
Engineering group. And as a research field,
we study a lot of topics that you all are familiar with. These are the sorts
of things that you know as an engineer
in computer science day-to-day you deal
with and try to tackle. And why do we study
these things, right? What’s so interesting
about these mechanics? Well, I don’t know about
you, but for me, programming is still often a huge challenge. And this is a bummer,
because it turns out that programming
software is this sort of miraculous artifact that
everybody uses to solve all of the hardest problems. So if programming
is hard but software is essential to solving the
world’s toughest problems, this is sort of a worthy cause,
something worth investigating. And so our goal is to
move from this world where programming is a tedious
and error-prone task to one where it’s more effective,
more enjoyable, and more fun. And we do have a
lot of fun doing it. So the post group
has great culture, a lot of goofy stuff going on. And we’d love to
hang out with folks, you’re always welcome
to come swing by the lab and see what’s going on. So the format today
is we’re going to have the graduating students
from the post group each give you sort of a
quick elevator-ish pitch of their thesis results
and what they’ve been up to for the past several years. So you can all read and you’re
going to hear from these folks. One thing I do want to mention,
though, unfortunately Sarah is traveling and wasn’t able to
make it to the session today. But you should absolutely
check out her work on end-user programming
with the Helena tool that enables sort of
non-programmers, non-experts to do really sophisticated
web scraping automatically. It’s super slick stuff. So without further ado,
we will kick it off with James Bornholt
who’s going to tell us about some amazing
synthesis results. Thanks. So yeah, my name
is James Bornholt, and I want to tell you about
some really exciting advances in the way that we build
software that we can rely on. So I want to start by telling
you about an application that we all rely on, that
we all trust explicitly, which is Snapchat. Now I don’t really
understand Snapchat, but I understand it’s
something with like photos or something like that. And so it probably
has this function called takePhoto right? Now takePhoto takes
this input the size of the photo we want to take and
it does some stuff with photos then it returns your photo. Now I don’t know how hard
this function is to write, but it’s probably pretty hard. So what do we do
to build functions that are hard to write? We probably know the answer. The answer is we
test these functions. So if we don’t know
how good takePhoto is, maybe we’ll test it
with some inputs. Maybe we’ll see what happens
when you give it 0 as a size. See what happens when you give
it 1 as a size, and 10 and 100. And once we have a
version of takePhoto that works for these test
cases, we’re totally happy. We’re going to
ship out and people are going to be happy with it. And let’s say that
maybe later on we discover that actually
we made a mistake. We should have tested the
case where size was 7. Now for Snapchat this
isn’t a big deal, right? They just add this test case to
the suite, they fix that bug, they ship a new
version of Snapchat, and everybody’s happy. This works fine for
apps like Snapchat. There are some
applications where this doesn’t work so well. The autopilot that
flies my airplane, I don’t want to test it
on four test cases, right? I want it to be correct
for every single input to this function. I don’t want people to
be discovering later on, oh yeah, by the way,
when attitude is 7, something bad happens. So we need to do better
for this kind of software. What can we do for
software that’s really critical to systems? to life? How do we build better software
for those applications? So the programming
languages community actually has a
solution for this, and has had it for a long time. And it’s two things. The first one is verification. Now verification
is something you’re going to hear a lot about
in the next few talks. But basically you can
think of verification as a form of exhaustive
testing, right? So verification
actually is a way of constructing
evidence or a proof that a function works for
every input we give it, right? No matter what input
we give to our program, we get the right output. Synthesis is kind of
an extension of that. So if you know what the
function is supposed to do, if you actually have a
specification of what you want a function to do,
then using synthesis we actually just generate
the program automatically. So save you from the effort
of writing the program, just generate it automatically
from the thing that you wanted. And these are two really
powerful techniques in programming languages. So what I want to show
today is two things. The first thing I
want to show you is an example of a verification
and a synthesis tool. And for me it’s going to
be an example that comes from the file systems domain. But the reason I’m
showing you this example is not because I think
file systems are especially compelling for
everybody, but rather because I want to show
you a second thing, which is that it’s getting
easier and easier to build verification
and synthesis tools for X for the domains
that you care about. For programming
languages problems that you have in your work. So those are two things
I want to show you. So let’s start talking a
little bit about file systems. So file systems are this
really interesting domain that probably everyone
is familiar with, right? So here’s a really common
example of something you have probably
written code for before and that most
applications have to do. Let’s imagine that you have
a file in your file system and it’s called foo. And what you want
to do is you want to update the contents of that
file to be the string hello. There’s a variety
of ways to do this, but this is a
pretty common task. Something like
SQLite has to do this when you write your database. Microsoft Word has to do this
when you write your documents. A very common programming task. I want to show you one solution
to this programming task. So here’s the program. You can ignore the
first line here, but I want to point out
the second line and so on. So what we’re
going to do here is that first I’m going to
actually open a temporary file or, a file called foo.tmp. We’re going to write a string
to the temporary file, close the temporary file,
and then finally we’re going to rename the
temporary file over the top of the original one. So this is a way of
accomplishing our goal, right? Because at the end of this
program what will have happened is we’ll have a file
called foo, and it will have the contents “hello”
that we want it to have. At least that’s the
intention, right? Now we’ve got a tool
that we call Ferrite. And Ferrite’s a tool
for verifying programs like this one. For checking that
actually this program does what I just told you it does. And so the way Ferrite works
is that we write an assertion at the end of this program,
and the assertion is just saying what I just told you. That after the program runs,
when we read from the file foo, we should get the
string hello, right? So I can run Ferrite on this
program for you right now. And it’s very quickly
going to tell you that this program is wrong. So what’s going on here is that
Ferrite’s actually pointing out a really subtle thing, which
is that this program is correct when it runs to
completion, but it’s wrong if the program crashes
halfway through. And in particular,
the output is saying is that if this program
crashes halfway through, you can end up in a situation
where your file foo is empty, right? Its content is the empty string. Which clearly isn’t
what we wanted, right? Now Ferrite is also capable
of taking this program and fixing it
automatically for you. I don’t have time
to show you that, so I’m just going to
fix it for you manually. The missing thing here is a
little bit of synchronization, right? So we’re going to add some
synchronization saying that we have to flush
the contents of the file before we do the renaming. If we run Ferrite on this
version of the program, it’s going to come back
and say that it’s happy. Right? So this is a kind
of powerful tool that we can write for
any particular domain that you’re interested in. Now why am I showing
you this, right? File systems are
the really cool, this is a really
important problem. But the reason that I’m
showing you this demo is not because file
systems are something that you need to work on, but
because this tool is something that you might want to
build for your domain. So let me tell
you about how it’s getting easier to build tools
like Ferrite for your domain now. So what’s hard about
building a tool like Ferrite? Well let me give you an example. Here’s a really simple
program, it’s very silly. What this program does is it
takes two inputs, two numbers, and it gives you back the
largest one of those numbers that’s even, right? So if you give it 5
and 6, you get back 6. If you give it 6 and
7, you get back 6. And it does it in kind
of a silly way, right? What it does, it
builds a list, x0, x1, it filters that list down to
have only the even elements, and it returns you the
maximum of that list. Now when you give this
function concrete inputs, it’s kind of obvious
what it does. But in tools like Ferrite
and verification tools and synthesis tools
more generally, we don’t get functions like
these concrete inputs. We’re actually giving
them unknown inputs, unknown numbers. So verification
tools, we’re giving it unknown numbers because we’re
searching for an input that breaks this program. So we don’t know
what an input is. For synthesis
tools, we don’t know what the program is, right? We’re trying to
find a program that works for any
value of x0 and x1, so again, we don’t know
the values of these inputs. Now the problem
is, when we don’t know the values of these
inputs, the way this program runs is really weird, right? So for example, when we go
through this filter predicate here, we’re trying to find– to take this list x0 x1
and filter it down only to the elements that are even. But we don’t know if the
elements are even, right? We don’t know if
x0 and x1 are even. So what we have to
do is fork, right? We have to create two
paths for this program– one where we assume
x0 was even, one where we assume that x0 was not even. Then do the same
thing for x1, right? So now rather than just
having one return value from this filter
function, we have to have four return values,
the four possibilities depending on the assumptions
we made about the inputs. Then only once
we’ve done this can we actually think about
running the second line of this program, right? The max. So we can’t just
run it once anymore, because there are
four paths now. So you have to run this once
for every path we created. So this is the thing that makes
building tools like Ferrite really hard, right? It has this kind of unusual
execution mode you probably haven’t seen before. And you can also
probably see that this is kind of an exponential
blowup, right? The number of
branches in this thing is exponential the number
of branches in your program. So programming with this kind
of model is really difficult. So we’ve been working
on a lot of things to try to make this
kind of program model easier for people to use. I want to show you
just one example. It’s a tool we built
called a symbolic profiler. Now symbolic profilers are a
lot like a regular profiler. It’s designed to
identify bottlenecks in your code, the parts where
your code is not scaling well or not performing well. But the difference with
a symbolic profiler is it understands that really
weird all-path execution model of tools like Ferrite
and verification tools and synthesis tools. We don’t have time to go into
details about how it works, so I want to show you first
that it’s really fun to watch. And secondly and
more importantly, we actually have some
really interesting results using a symbolic profiler. So we took a variety of real
world automated reasoning tools. These are verification
tools, synthesis tools that people are actually
using in the real world. And we were able to make them
anywhere from six times faster to 300 times faster. So for example, there’s a really
cool verification tool being used down at UW Medical Center. It was actually built by folks
here in the Allen School. And what it does, it
verifies the software for a radiotherapy
system they have there. So we were able to make
that tool 300 times faster. That’s really exciting for
the people who use that tool, right? That tool is now
basically instantaneous, so the development
model is much better. So this is an
example of the kinds of tools we’ve been
building to make it easier to build your own verification
and synthesis tools. So really, the
bottom line here is that I think it’s getting
a lot easier to build these kinds of tools, right? It used to be back
in the good old days that building a
tool like Ferrite or verification tool or
synthesis tool for the domains that you were interested in
took multiple PhDs, right? This was a really
difficult task. That’s gotten a lot
easier recently. To the point where basically
you can do what I did and get a PhD in
formal methods– well I hope I did anyway. And you only need one PhD
to do this now, right? I built Ferrite, built a
couple of tools like that. But actually, we’re getting to
a more mature stage now, right? We have things like profilers,
we have debugging tools. We have abstractions
for building these kinds of verification
and synthesis tools, to the point where I think now
it’s actually possible for you to build one without having
a PhD in formal methods. So take this domain
X that you’re interested in, that
you have expertise in, and build a tool like this. I’m always really
happy to talk to folks who are interested in
working on things like this or have domains
that they’re really interested in
finding solutions to, and thanks very
much for your time. [APPLAUSE] All right. So today I’m going to tell you
about why static analysis is awful, but before
I do that, I want to first tell you why static
analysis is actually awesome. So static analysis lets us do
is take some program that we’re interested in– here
is a simple program, but it could be
anything, we’re just iterating over an
array of numbers and writing them to the output. And then a set of
properties that we might be interested
in– for example, we want to make sure that
all of our array axes are within bounds, or that
there are no memory leaks, or that we never
de-reference null. And static analysis lets
us build an analysis that if it ever comes
back true for a given property in a program, then
if we ever run that program, that property is
guaranteed to hold. So for example, we might
have the property NoLeaks, and if our analysis
says that they are going to be no memory
leaks in the program, then when we run the
program, there was, by gosh, will be no memory leaks. We can do is also for
null pointer exceptions. And what static
analysis lets us do is that although
I’ve been talking about these specific properties
in this specific program, it lets us talk
about some arbitrary program and some
arbitrary property and lets us build these
analyzes in a principled way such that this
implication holds. And this can hold
even when that lets us get these computable
properties– or analyzes, rather, even when the
property in question might be undecidable. So the foundational
result in static analysis is abstract interpretation,
which goes back to the 1970s. An abstract interpretation
uses this beautiful theory of lattices and
monotone functions, and has given us a huge array of
properties that can be verified using static analysis. And further advances have
allowed abstract interpretation to be applied to
programs that support features like loops and
arrays and linear arithmetic. So, what’s the problem? This sounds great. We have a way for programmers
to automatically get guarantees about their
program’s behavior even when those properties
might in general be undecidable. So the problem is
that programs today do not just write simple
array manipulation programs. They are often using
what we call frameworks which are these
scaffoldings onto which they can build the applications
that they’re trying to build. And these things handle the
boilerplate and boring parts of application construction. And this is just a small
sampling of Java frameworks that I found while searching
GitHub for maybe 10 minutes. Because these frameworks
are the common building block for a lot different
type of applications, this also means that they have
to be as flexible as possible. And this means that they have an
enormous amount of complexity. So this means a lot of flexion
use, enormous libraries, a lot of layers of abstraction. And to give an idea about
what this complexity looks like, I want to show you
my personal white whale when it comes to this– Spring, which is a Java
web application framework. And this is a stack trace that
I encountered while analyzing a Java application. And now this looks really
bad, but it’s actually worse cause I lied to you, this
is not the stack trace. This is actually
the stack trace. And if you look
closely, which you can’t cause the font
is just too small, you’ll see things like a
CachedIntrospectionResults, an AbstractAutowire
CapableBeanFactory, and a
DefaultSingletonBeanRegistry. So this looks very complex. These are all
ridiculous names, we must be doing
something very complex. So maybe this is an
unfair comparison, this is some complex
distributed application. So what the stack
trace is actually doing is instantiating this class. There are more bits of
information in the stack trace than in the class
definition itself. So the point I’m
trying to make here is that even the most
trivial applications when they use frameworks will
encounter the full complexity of frameworks. And even the most simple
framework-based applications will eventually reach
this stack trace. And this is going to be very
bad news for anyone trying to do naive abstract
interpretation on framework implementations in
the applications built on top of them. So we have this gap here between
this wonderful mathematical theory that tells how to reason
about these programs’ behavior without having to run them
in a finite amount of time, and then on the
other hand, we have the actual practice as performed
by the software engineers today. So how can we bridge this gap? So obviously static
analysis is actually used today on real software. So what are people doing? While they might be writing,
this is the state-of-the-art, they might be writing models. And these are executable
specifications of the framework’s behavior. So the way you write one of
these models that you read the Spring reference
documentation, which is a 1,000-page tome,
and then you write models that describe how the framework
behaves so you don’t have to actually directly
analyze the implementation. It may not surprise
you to learn that given how long this
reference documentation is, that these models
go on and on and on. And again, I want to stress that
this is the state-of-the-art for analyzing framework-based
applications this year in 2018. And this is just
untenable, especially given the number of frameworks out
there, the numbers that are popping up every single day. And so I just claimed that at
least in the context of context flow and heap-sensitive
whole program analyses, we must do better. Especially we want to bring
this wonderful promise of static analysis
to programmers today. We can’t say, we have this
excellent tool you can use to verify your programs, but
first read this 1,000-page document, write pages and pages
and pages and pages of models, and then you can prove that your
Tinder for Dogs doesn’t access out of bounds array indices. So I will say that I
think we can do better, and that’s what my research
has been focused on. In particular, I’ve been working
on a project called Concerto, which is a hybrid
analysis framework that enables the precise analysis of
frameworks-based applications without any manual modeling. The high level
idea is as follows. We start by analyzing
application code using our old friend
abstract interpretation. But the minute the
application calls into the framework, the
abstract interpreter yields into a concrete interpreter. When I say concrete
interpreter, I do mean we just run
the framework code. And then when the
framework code calls back into the application,
we yield back in the abstract interpreter. And this process can
repeat through nested calls and returns. So, obviously running the
framework is a very precise way to reason about its
behavior, but how can we possibly do this and still
retain that key compute ability property? Well there are two
key observations that we made about
framework behavior. The first is that framework
behavior is almost entirely determined by the contents of
static configuration files. So if we have this
configuration files and we plug them
into the framework, they just take a single path
of execution and then stop. They do what they’re
going to do and then stop. And further, it often
happens that these files are available to us
at analysis time. So when we run these frameworks. They can just open up
these configuration files, parse their autowire abstract
bean factories or what have you, compute what
they’re going to compute, and then return a
single result back to the abstract interpreter. This is obviously quite
tricky in practice, and there are a lot
of technical details I simply don’t have
time to go into. But for example, how do
we exchange the values between these interpreters? In one hand, we have
these abstract values that are summaries of
possibly infinite sets of concrete values,
and on the other hand, we have a concrete
interpreter that expects honest-to-God integers. So how can we actually
exchange these values in a way that everything sort
of works and so that we don’t have to have a tight
coupling between these two components? I said that framework
behavior is mostly determined by
configuration files, but that may not
necessarily be the case. The framework may
read from the network. They might access
non-determinism user inputs, so how can we handle that
in a way that still lets us retain this compute ability? Can we have the
framework– excuse me, the concrete interpreter
and the abstract interpreter cooperate in some way and share
information about their shared values? And there’s a lot more. And I’ll trust– I hope
that you’ll trust me that we’ve actually
found a sound solution to these problems
that lets us retain the soundness of the
overall analysis. So in summary, Concerto is
a principled combination of concrete and
abstract interpretation that precisely analyzes the
framework-based applications. We actually could reuse
the foundational results from 40 years of abstract
interpretation research to prove that this is sound,
that it terminates– again, making sure that we
have analysis that runs in a finite amount
of time, and then also we have provably improved
precision over using some naive static analysis
compared to just running the analysis on the framework
and the application together. OK. Thank you very much. [APPLAUSE] Any of you have a
question for John? Yeah, Hal. So if you’re in line when I’m
using the framework as sort of an oracle to tell
you things, how robust is this when they keep
updating the framework? The little deltas over
there causes everything to collapse and to start from
scratch or how does that work? So the question was, how do we
handle if the framework changes if we’re relying on Oracle? Is that right? Relying on it as an oracle. Right. So well, we assume
that the application uses a specific version
of the framework. And if the framework
updated and the application is updated to run
against that new version, we just run whatever version
the application is using. We’re not tied to a specific
version of each framework, we use whatever version
the application is using. OK. So, more verification up next. We’re going to hear about
distributed systems from James. Other James. James W. Yeah. So there will be a test at the
end of this talk about which James gave which talk. So I want to share– since the purpose
of this colloquium is kind of to teach the
department about the PLSE Lab, I want to first share a
very deep hobby of the PLSE Lab, which is sort of
irresponsibly unlabelled graphs, OK? So this is a graph with two axes
that have some labels on them, but there is no scale,
it’s kind of nonsense. But the point that I’m trying
to make with this graph is that kind of similar to
the one that other James made, namely that it used to be
really hard to build stuff and I’ve spent a lot of
time building hard stuff, and it’s now getting easier to
build verified applications. And so it definitely
correlates with the time that I’ve spent in grad school. I can’t guarantee that it’s
fully caused by my PhD, but at least maybe
partially caused by it. So what makes the top left
corner of this diagram so high up on the effort axis? I claim that the key
challenge in verifying things like distributed systems
is inductive invariants And inductive
invariants are required because these systems have
infinite state spaces, as I’ll explain in a minute. So through a lot of my work,
I’ve tried different solutions to this problem. Early in grad school, I tried
a fairly straightforward solution, which I
like to call proof by grad student, which involved
a lot of manual effort. And since then I
decided I don’t really want to put myself
through that every time I want to build a new distributed
system, so I’ve been working on more automated techniques,
both on the proof side and on inductive
invariant inference. OK, so that’s kind
of the high level, but maybe I should
start by telling you what a distributed system is. This is a very accurate picture
of a distributed system, OK? So they look like this. If you see them in the
wild, maybe don’t approach. And what are they there for? Well, what they do is
they power all the little icons on your phone, OK? So behind each one of these
apps is a distributed system that’s managing databases and
user data and so on in order to make the app work. These are also used for
important actual infrastructure that may or may not be
running on our phones, like planes, trains,
and automobiles. And the problem with
these distributed systems is that they’re
hard to get right. So to give you an example,
one summer early in my PhD I went on the internet
and read the news and I found three examples
of distributed systems bugs. So the first one was
that the New York Stock Exchange had to halt trading. The Wall Street Journal’s
website went down and United grounded all
flights, and these three distributed systems problems
happened on the same day. So what’s going on here? Well, these systems execute
in unreliable environments where they’re executing
concurrently with other nodes, and they’re exchanging
messages that can be dropped and
duplicated, and they’re running on machines that
can crash and reboot. And under all of
these circumstances the system is supposed to
maintain its guarantees. OK. And so in order to try to
do that, you might think, well, let’s test the
system extensively. And as I’m about
to argue, that’s basically impossible because
there are too many behaviors. And in fact, to make
a bad pun, there are not just too many
behaviors, but infinitely many bad behaviors. So it’s really
impossible to test all of the execution of the system. So I’d like to bring
back Dijkstra here, and he tells us, “When
exhaustive testing is impossible, our trust can
only be based on proof.” OK, so what I’ve
been doing in my PhD is applying proof
techniques to reason about these infinite
state systems. And what makes these
particularly challenging is that these days,
we’re actually able to apply proof to the
implementation code itself, not to the more abstract
models, and that makes things both more exciting
and also more challenging. OK, so how do these
proofs actually work? Well, we’re going to start
by modeling the system. And I’m now we’re going to
get into a little bit of math because I want to show you
some of the cool ideas. So we model a system
by constructing a mathematical
model of its state. So we start with
some– let’s say a single initial state, which is
how the system is initialized. And then we have a
mathematical model for how the system makes
progress, a little bit– one step at a time. And we sort of unroll
this step relation to get the state
space of the system. And so each of these
states is a reachable state that you can get to by
starting from the initial state and then unrolling
the step relation. And our goal is to prove
that some property is true in all of these states. And again, there are
potentially infinitely many of these states. OK. So we want to show that all
reachable states are OK. Let’s try to write that in
slightly more mathy terms. Here’s a formula that says, if
state s is reachable in zero or more steps from
the initial state, then s satisfies
this property ok. OK, so this is the goal
that we’re trying to prove. And because there’s sort of
an unbounded number of steps on the left-hand side
of this implication, direct approaches like
enumerating states or testing-based approaches
are very difficult to apply. So instead what I’m going
to do is apply maybe my favorite idea, full stop,
which is induction, OK? So the idea with induction
is to find a property I such that three things are true. First, I is true in
the initial state. Second, I implies this
safety property ok. And third and most
importantly, perhaps, is that I is inductive,
by which I mean whenever I holds in a state s,
and s steps to s-prime, then I also holds
in s-prime, OK? And this looks perhaps
relatively simple but is actually quite deep. For example, you might
notice that there’s no hypothesis over here that
s or s-prime actually be reachable from the
initial state, right? All I know is that they satisfy
these inductive variants, OK? And the way you should
think about these inductive invariants is that
they accurately summarize all reachable states
in sort of a finite way. And in particular,
they allow us to reduce this unbounded reasoning
at the bottom– under the line to a reasoning
about only a single step above the line. OK, this sounds great. The catch is that these
inductive invariants are really hard to find,
because like I said, they have to accurately
summarize all reachable states of the system, OK? So one guess as a place
to start on your search for an inductive
invariant might be to say, can I make the
inductive invariant the safety property itself, OK? And if you try that,
you’d be trying to prove this formula that says,
if the safety property is true and then I take a step, the
safety property is still true. And that will typically
not be something that you can prove because the
specifications are typically not inductive. They don’t talk
about every aspect of the state in
sufficiently precise detail to make this provable. So what you have to do is find
a bunch of additional facts that you need to maintain
in your inductive invariant such that this comes
out to be the case. OK. So I said that I summarizes
all reachable states. What are all the regional
states of a distributed system, and what is this step relation
that I’ve been talking about? So here’s a fancy video to sort
of show you what a distributed system is supposed to be doing. So these are all nodes
in a distributed system– it doesn’t really matter what
the distributed system is. And what you just saw
happen was a timeout fired, and then some node decided
to send some messages which are then being delivered. And the state over on
the right-hand side is changing as these
messages are being delivered and so on, OK? And so as the system progresses,
some nodes might fail, and then they’ll stop
responding to messages. And in order for the system
to continue correctly, it’s going to need to maintain
its safety properties even in the presence
of those failures. OK. So we need to capture all
of those kind of events– message delivery, timeout,
failure, and so on– in this deprivation. And that’s sort of the basic
idea of modeling these systems. So to give you an
example of a system, together with my
co-authors, we have analyzed something called the
Raft consensus protocol, which is a distributed system
that allows you to replicate a single node system onto
multiple machines in such a way that you get high availability. And this is the part of
my grad school experience that I was describing earlier
that was a lot of work. So we did this in the
Coq Proof Assistant. It was about 50,000
lines of proof. And like I said, it was
kind of fun at the time, but I don’t want to do it again. So when we did this,
we actually were talking to the designer
of the Raft algorithm who had tried to produce a
proof himself during his PhD and decided that
he could graduate without finishing the proof. But he was excited to
hear that we had done it. So sort of the takeaway
message from this slide is– it was fun once, but not again. And so what makes
this thing so hard? Well, really it’s
that we had to come up with the inductive invariant,
and then we had to prove it. So in the last minute
or so of my talk, I want to tell you about
some more automated solutions both to proving
inductive invariance and then even to finding
inductive invariant automatically. OK. So the first technique
I’ll tell you about is just some proof automations. So we had these
three goals that we needed to prove about
I. Suppose we had I and we just wanted to
automate the proof. Then what we would do is try to
express both the system model and I in a decidable
fragment of logic, such as one that might be
decidable by the Z3 SMT Solver. So the logic that I like to
use for this kind of thing is called effectively
propositional reasoning, but other possibilities
exist as well. And this is not always
possible in theory because there exists
undecidable formulas, and so you may not be able
to express your invariant in any decidable fragment. But in practice, we have
found the surprising fact that it seems like
there’s always a trick to modeling your
system in such a way that you can express it in
its decidable fragments, which I find quite surprising. OK, so that’s a little bit
about proof automation. What if we want
to find I itself? Well, what we can
do then is kind of similar to what
James was describing in program synthesis. We kind of want to synthesize
this inductive invariant I. So you can think of this as a
search problem over formulas that we want to plug
in for I. And we have some work under
submission right now that restricts the
logic of I eve further in order to make this possible. And sort of in parallel
with the last slide, I would say this is not always
possible in theory, but also in practice not always
possible at this time. So this is a much
more difficult, but even if it’s
only finding part of your inductive
invariant, it can still save quite a lot of work. OK, so just to summarize,
we have unlabelled graphs, we have inductive invariants,
we have proof by grad student, and we have proof automation and
induction invariant inference. And I’d be happy to
take any questions. [APPLAUSE] Any questions for James about
logic, inductive invariant, distributed systems
verification? So when you’re
writing these proofs about a specific
implementation protocol and you’re stuck
on something, how do you figure out whether
the reason you’re stuck is because there’s
actually something wrong with the implementation
or because you haven’t quite got the right invariant yet? That’s a fantastic question. So it is often the
case that the reason is because your inductive
invariant is not right yet. But sometimes you do find
bugs in the implementation. One thing that can help
is if your proof technique can generate concrete
counterexamples for you, which is definitely not the case in
our Coq Proof Assistant work, but in some of the more
automated techniques we actually have counterexample
generation working. And so when the proof
doesn’t go through the system tells you here’s a concrete
example state where if you take this step, you
will violate something, and that can– if that violation is
of the safety property, then that’s a sort
of constructive proof that your implication is wrong. Another thing that can
help you figure out is if you can visualize the
execution of the distributed system and sort of play with it. Which– oh wait– Doug’s up next! Maybe we’ll learn
something about that. That was a great
segue, Zac, thank you. I’m Doug, and I’m
going to be talking about a graphical interactive
debugger for distributed systems. So first, I want to
tell a little story– and please brace yourselves,
because it’s a very sad story. So James was talking about the
formal verification of the Raft consensus protocol. The other person drinking
the beer and eating the pizza and working on Raft was me. And since we did
this verification, we’ve been sort of lurking
on the Raft dev mailing list. And this is sort of a typical
email sent to a Raft dev. And so I got Raft as this
distributed consensus protocol, and it’s sort of complicated
and tricky to reason about. And so a lot of people end
up sending these emails about like proposed
optimizations to Raft. So the details of
this aren’t important, but this is some kind of
proposed optimization. And it’s wrong because all of
these proposed optimizations are universally wrong. Modifying a distributed
system in a way that actually maintains
its correctness is for most programmers
essentially impossible. And I sort of figured out why
I was wrong and then replied to this person, and I think it
was pretty polite and possibly a little kind condescending. But this is the email that
I would have wanted to send, which reads, this won’t work. These things never work. If you try to get
it to work, you’ll end up deploying
something to production that will lose
your data and cause huge headaches for your
users and your sysadmins. And this makes me really sad. I think the programmer
should be able to experiment with the systems
that they use and be able to make modifications
without sort of worrying that it’s going
to break and it’s going to lose literally
all of their data. So this is a huge bummer. So distributed systems,
as James mentioned, are really important,
like they use in Snapchat. They’re really difficult, and
developers really need as much help as they can
possibly get in order to implement them correctly. So most of my work is
really focused on this. I’ve been working on sort
of helping developers to build more reliable
distributed systems. And previously I’ve done a lot
of work on formal verification. As James was talking about, I
was part of the sort of proof by grad student strategy mostly. But more recently I’ve
been working on other tools for helping developers,
and especially very novice developers helping
students to understand bugs in distributed systems. So understanding bugs, right? There are two main
questions when you’re trying to understand a bug. The first is why is
my system misbehaving? And the second is, what can
I do in order to fix it? And the baseline that I’m
sure most of us or all of us have encountered is just staring
at the output of your program in a terminal, right? Staring at log files produced
by like printf statements or whatever. A better solution
to this problem would be graphical
interactive debuggers like those included in Visual
Studio or Eclipse or IntelliJ. So what these graphical
interactive debuggers let you do, it’s a
step-through system execution, sort of line-by-line examine
the state of the system as it evolves, and you can
also sort of run the program until some conditions are met
and then examine its state once it’s there. So this is a screen
shot of Visual Studio, which is one of these IDEs
that includes a debugger. So there are a few features
here that correspond to those that I mentioned. So the program is
currently paused here, and you can step through
it by clicking buttons. You can look at the
variables in scope here. So this shows you the
current state of the system, and as you step
through it, you can see how those variables
change over time. And then this specifies like
a breakpoint or a watchpoint. So a breakpoint means that if
you just run the program, then the next time it gets to
that line, it will pause. And then a watchpoint allows
you to specify some conditions on the program
state such that it will pause once it
reaches a state matching those conditions. So can we use this kind
of graphical debugger to debug a distributed system? So here’s a distributed
system on these slides. It’s a very simple
distributed system, it’s just a client and a server. So what the client is doing
is first setting the timer such that after 10 seconds, it
will send the server a message that just says A.
Then it’s sending the server a
message that says C, and then a message
that says D and then B. And then all the server is
doing is just running a loop, receiving these messages,
and then appending them to its state. So let’s say that we’re trying
to debug this distributed system using a graphical
debugger like Visual Studio, and we’re trying to
answer the question, is the server state
ever ABC, right? So we can say that like if
the service date is ever ABC, this is a bug for
whatever reason, right? We don’t want to
get to that state. So the answer turns
out to be yes. And how we would
get to that state is first we would wait
a while and then deliver this A message. Then we would delay
this C message. Then we would drop the
D message entirely, then deliver B, and then
finally deliver C, right? And this is because the
network can sort of arbitrarily reorder the messages
that you send. So my claim is that using a
graphical interactive debugger like Distributed Studio– sorry, Visual Studio,
there’s really no way of getting
to this state just by stepping through the
program line by line. And so as a result,
you’re left with sort of the baseline case of
just staring at log files, except now instead
of just one log file, you’ve got as many
as you have servers. So this is sort of
a bad situation. As James mentioned, the
James immediately before me, distributed systems are
difficult to develop because of the combination
of concurrency and failure. And for exactly
the same reasons, they’re really not amenable
to traditional debugging. And that’s because in order
to debug this program, the debugger really needs
to be able to control the order of all of the
events in the system, including messages and
timers and failures. So I’ve been working on
this system called Oddity, which is a graphical interactive
debugger for distributed systems. And what it does is gives
the users control over the ordering of these events. And currently it can be
used on any system that’s specified as set of
atomic event handlers. So Oddity sort of
gives you the features of these graphical
interactive debuggers on distributed systems. So it lets you step through
the systems execution by controlling exactly
which event happens next. It lets you examine the state
of the system, so messages and timers as well as
the state of each node. And then I’m working
on an implementation of watchpoints, which I’m
hoping to be done for my PhD. So now to an extremely
risky live demo, and this is going to be Oddity running
on a real distributed system. And it’s actually going
to match the distributed system from the example. So this is the same system
that we saw earlier. So we have the client
and the server. The client has sent– has scheduled this timer for
it to send the A message, and then it sends a B,
D, and C to the server. So in order to end
up with the state that we’re trying
to end up with, we can first deliver
this A message– the A timer, rather, then
deliver the A message, then deliver B.
And then I’m going to make this very
obvious mistake. So OK, we’re going to
deliver D by accident. So this is bad, right? Now we’re not in the state
that we were going for. Luckily, Oddity
implements time travel. So because there’s so frequently
so much non-determinism in distributed
systems, you really want to look at multiple
executions at a time. So we can look at the
history of events down here, and then just click
on the one before we made that mistake, right? So now D is waiting in
the server’s inbox again. We can instead
deliver C, and then if we examine the
server state, we can see that it has state ABC. OK. So in summary, Oddity lets
you control the ordering that events happen in to
inspect the node state, message contents, and timer contents. It also the lets you
navigate this branching event history, including time
travel between various states. So we’ve evaluated
Oddity by deploying to a couple of
classes here at UDub. A professional master’s
distributed systems class, and then the most
recent iteration of the undergraduate
distributed systems class. We’ve integrated them into
this new labs framework for distributed systems
classes that Alice and I have been working on. The framework includes
a model-checker, and Oddity can be
used for visualization of the counter-examples
generated by the model-checker. And students were encouraged but
not explicitly required to use Oddity. So in general, students who
did use it found it useful. Please talk to me about like
giving research software to undergrads. It’s really interesting and has
some sort of unique challenges. Again, students are really
enthusiastic about using it, but it can be difficult
to actually get them to. But since we’re able to use it
to diagnose liveness bugs– so cases where the system was
failing to make progress, and then it turned out that
it wasn’t applying to clients or applying to clients at
all, and this was really easy for them to see in the debugger. They’re able to use it to
diagnose performance bugs. So there is a case
where the system was sending sort of totally
unnecessary and redundant messages. This is totally
invisible to the testing, but it was really obvious that
it was happening on Oddity, they could just see
that the messages weren’t being delivered. And finally, they
were able to use it to visualize the
counter-examples generated by the model-checker. And in particular,
students found it useful to explore alternative
execution paths using the time travel feature. So in the future, both
during my PhD and after that, I want to work on
extending Oddity by integrating a
model-checker and making it applicable to more systems by
interposing on the system calls that system think. I’m interested in making
other similar tools, so you can imagine like a
graphical interactive debugger for operating systems. There are a lot of sort of
naturally visualizable features in operating systems, like the
page table or the scheduler. And you can imagine a
graphical debugger for those. And you can imagine other sort
of education-focused graphical tools. I’d also like to continue the
work that James and I were doing on formal verification. So you can get really
strong guarantees for these particular
classes of system. And I really envision a world in
which we have better developer tools and better developers. So therefore, all systems
will be less buggy. Formal verification can ensure
that a lot of classes of system actually have zero bugs,
and maybe eventually we can actually trust
the software that sort of power in our world. So thank you, and I’m happy
to take any questions. [APPLAUSE] Any questions for Doug
about deploying tools and the distributed systems? Yes. Doug, this really seems
like a wonderful piece of– I guess a wonderful
tool for a developer to have at his disposal. My question to you
is, in your opinion, do you think it’s a little
weird, strange, or unexpected that people put up
with console line form of debugging for so long
or what seems to be a long time? I mean, I think it’s
really a result– I’m sort of speculating here,
but it’s what we’re used to and it’s what we’re
taught, right? So like if that’s what
we use in classes, right? We’re mostly using
printf debugging on like short programs, and so
therefore we just sort of get used to it and we tolerate it. But yeah, I mean, it’s terrible. It is bizarre that like this
is sort of continues to be more or less the state-of-the-art in
many if not most domains, yeah. OK. Sorry, we’re going
to do– there you go. Keep it moving. Because up next, Pavel’s going
to talk to us about verifying the visual layout of web pages. All right, great. So as you might have noticed,
through these previous talks, there’s sort of been
a theme going on. Which is taking these
classic programming languages and software engineering ideas,
like verification, language semantic, static analysis,
and trying to extend them to new domains, whether
that’s frameworks for job applications,
distributed systems, file systems, and so on. And my work is
also in this vein. You see, I’ve been working
for the last few years on taking these classic
programming languages and software
engineering concepts and extending them
to web page layout. Now OK. Web pages are like programs
in a variety of ways, and I mostly mean not those. So web pages contain
a backend service that’s usually written in
something like Python or PHP, and also some
frontend code written in something like JavaScript. And these are basically
general-purpose programming languages. So applying the standard PL
like verification and semantics to them is at least
comparatively straightforward. But there’s a third
underappreciated component of web pages, as you
can tell from the fact that the right-hand third
of the screen is empty, which is maybe less
obviously a program, and that’s web page layout. And that’s what this talk
is going to be about. Now, I’d like to get into
the details of how we apply PL concepts to web page
layout, but first I want to convince you that web
page layouts are programs. And I do have a
demo video of this, but I think it would
actually be better if I showed it to you live. So this is
Healthcare.gov this is a website used by many people
to get health insurance. And you can know
that it’s a program because the output of this
layout, the image you see here depends on the inputs to
that rendering process. And those inputs aren’t
command line flags or files on your file system, they’re
things like the width of your browser window. So you can see that as a change
that, the layout of the page changes as well. And eventually, if I make
the page small enough, it’ll switch over to an
entirely different website. So that’s one of the inputs
that we can pass to a website. There are others. For example, I can increase
the default font size on the computer. And you can see that as
I increase the font size, not only does the layout
change, but it also reveals bugs, like
this search box that’s now floating outside
the toolbar that was supposed to be positioned in. If web page layouts are programs
and they can contain bugs, we want to apply verification
and static analysis to them. So if we’re thinking
of web pages as programs, the first
step is to think about, what specifications,
what properties do we want to improve
about those web pages? And for web pages, these
are visual properties. Things like no text
overlapping other text, all of the buttons
being on screen, or the headings forming some
sort of consistent hierarchy. And we want to know that these
properties hold because we’re good web developers, because
we want to make good web pages, but there’s also legal
reasons, like satisfying the requirements
of the Americans with Disabilities Act. So, there’s a lot of reason to
verify that these properties hold of our web pages. But it’s not quite clear how
do you apply traditional PL concepts to this new domain? And that’s really what
my work has been about. Roughly speaking, there’s
been two big challenges we’ve had to surmount in talking
about web pages as programs. The first is that
we needed a way to describe how web pages
behave, what a browser does. And the second is that we’ve
had to have a way of describing what web pages should
do, the visual properties that they ought to satisfy. So I’m going to walk through
both of these really quick to give you some sense of
what this work has contained. And the first is formalizing
how web pages behave. Now the behavior
of web browsers is described by this extremely
long English language document put out by World
Wide Web Consortium. And like any other extremely
long English language document, it is both ambiguous
and incomplete. Now in contrast to this
English language text, there is also the source code
of actual extant web browsers. It’s also very long–
millions of lines of C++. And while it’s
complete in the sense that it tells you exactly
what the browser will do, there are also disagreements
between browsers and between browser versions. So to get some
coherent understanding of how web pages behave, we
had to combine the benefits of these two approaches. We did that by taking it from
this World Wide Web Consortium not the text of the standard,
but actually a collection of conformance
tests that browsers use to ensure that
they’re implementing the standard correctly. For these conformance tests, we
can find the desired behavior– the thing browsers
actually need to do– by running existing
web browsers. And this combination of
conformance tests and desired behavior covers the
standard and allows us to ensure that every edge
case is handled correctly. And by doing that, we’ve
been able to describe the behavior of fairly
complex web pages in ways that are consistent
with existing web browsers while also shrinking these
pages and pages of text to a fairly short and
clear 1,000 lines of code. So that tells us how
web pages behave. But we’d like to know
whether they behave the way they should behave. To do that, we need
a way of describing the visual properties
of web pages. So I’m going to just do
this directly by example. Here’s how in our
system you describe that all the links on the
page you’re on screen. In this logical formula,
I’ve given a property that describes all boxes– oh no– all boxes on the screen. And that’s that
for all b in box. And a box is just
some rectangular part of the page text or
perhaps on an HTML element. I’ve then asserted
that if that page– if that box is a button,
than it is on screen. In other words, all the buttons
on the page are on-screen. And you can see that
this assertion contains sort of three different
kinds of statements. First, I have a
property that describes a bit that talks about all
of the HTML elements, all the boxes on the screen. Then I say something
about the HTML that defines that element,
like here that it’s a button. And finally, I give
some geometric property that element has to satisfy,
like here that it’s on screen. So we’ve built a language that
can do basically these three things. And it’s expressive
enough to describe a variety of
accessibility guidelines, including things like from
the Department of Justice implementation guide on the ADA. At the same time we’ve made it
possible to automatically check these assertions of a website
for all possible combinations of widths, heights,
font sizes, and even some of the weird parameters
like the width of your OS scrollbars. So, that’s been sort of
our technical approach, and now I just want to give
you a brief flavor of how we’ve evaluated and tested this tool. Now the first thing
I told you about was capturing the behavior of
web browsers on real web pages. And so here I have
a collection– a table of the unit
tests that we’ve run. These conformance tests that
we’ve gotten from the World Wide Web Consortium. And you can see
that we’ve actually evaluated something like
3,500 of these tests across 45 different
sections of the standard. And across those
unit tests we found that consistently
our formalization either matches the browser or
matches a bug in that browser’s bug tracker. And these covered things
like rounding errors or an incorrect implementation
of floating layout. And now this tells us that
our formalization is correct, but to make sure that
our formalization is useful for determining the
properties of web pages, we’ve gone further and
collected 62 different web pages from common web
templates and ran them against accessibility
properties that we’ve collected from things like
the Department of Justice or the Apple Human
Interface Guidelines. And so you can see here that of
these accessibility guidelines, we generally verify web
pages, and when we don’t, we produce relatively
few false positives. So here, we found 65 bugs
across these 62 web pages and only 13 false positives. That’s been our approach
to verifying web pages and ensuring that accessibility
properties and potentially other visual properties
like mobile friendliness or usability are valid
across all possible inputs, like width and height and
font size, to a web page. Thank you. [APPLAUSE] Any questions for Pavel about
verifying web page layout? Did you say it is
possible also to formalize the interactive properties? Right. So the question is
whether it would be possible to formalize the
interactive properties of web pages? And I assume you’re talking
here about JavaScript and the way JavaScript
modifies web pages. There is. So we’ve been working in– we’ve been working in
the last few months on trying to extend
our tool to talk not about particular
components of the HTML, but actually about components
with generic content. So we might call this
client-server reasoning, and what we’d like to do is
to go from just layout to both the interactive component
in JavaScript and also this server-side component. To do that, we’d like
to instead of talking about particular
bits of HTML, talk about HTML that contains
some unknown parameter, like the icon to show or
the title of the page. And also potentially
some side conditions that need to hold for those
visual properties to be true. Like for example, if
the label is too long, perhaps that visual
property won’t work because the text will
wrap outside the button. And so we’ve got some work in
progress to apply this approach and try to verify
both the web page visual layout both before and
after some Javascript runs. All right, folks. If you have anymore questions,
please find one of the speakers after we’re done. Thanks, everyone,
so much for coming to this kickoff colloquium. Make sure you’re here
3:30 next Thursday. We’ll be hearing some talks
from the Theory group. All right. See you later– let’s thank
all the speakers one last time. [APPLAUSE]


Leave a Reply

Your email address will not be published. Required fields are marked *