The Code Doesn't Lie — with Mike Bostock

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to The Test Set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning. Dig into what makes them tick, plus the insights, experiments, and OMG moments that shape the field.

In this episode, we talk with Mike Bostock, creator of the visualization library D3, one of the top three starred GitHub repos through much of the 2010s. He was graphics editor at the New York Times, and he founded Observable, whose Reactive Notebooks handle the whole of his process from end to end. We talk about his journey into visualization, which was heavily influenced by his early time on the Google search quality team, where feature decisions often came down to a single number that was heavily debated, and how notebooks are sort of an attempt to crack open all the computation that goes into visualization.

And of course, we end talking about AI agents and the future of notebooks. I'm so excited to bring this interview to folks. And so with that, Mike Bostock.

All right, Mike, welcome to The Test Set. We're so excited to have you on. So you're Mike Bostock, creator of D3, and you were the graphics editor at New York Times, and now a founder at Observable, where you build powerful tools for visualizing data through code, UI, and AI. Yeah, thank you so much for coming on.

It's my pleasure. It's great to be here.

And I'm joined by co-hosts, Hadley Wickham , who's chief scientist at Posit, and Isabel, who's just an incredible software engineer at Posit, and graciously agreed to come.

As people can see, especially the people listening via podcast, we're in beautiful San Francisco. If you can't see the video, we're actually on the Golden Gate Bridge, and we're surrounded by birds. It's a beautiful day. So we're so happy to have you.

Origins of D3

Mike, I feel like you've done so much. And as I talk to colleagues about D3, I feel like there's this incredible history of D3. So I'd love to talk a bit about how you got there and built D3. And then I know there's a lot you've talked about on open source and AI and your work with Observable. I was curious if you could just catch us up a bit on some of the history of D3 and your work there.

Sure. I do like to build tools and have been doing it for a while. And I think a lot of my ideas for tools come out of frustrations using existing tools, or in some ways a desire to understand how those existing tools are built or why they were built the way they were built. For D3 specifically, I was doing a lot of work in browsers, in SVG, using JavaScript, using the DOM API. And in particular, the DOM API is very verbose. And if you're doing stuff with SVG, there's this namespace URL that you have to remember to create an SVG element or to set xlink href attribute and stuff like that. And so it was just difficult to remember this specific URL.

So the goal with D3, I mean, and there are other libraries that I worked on that kind of predate that. But I think D3 specifically was focused on kind of interaction and animation and transitions and performance. And I think in many ways, like a lot of its success came out of being at the right place in the right time kind of thing. So it really builds on web standards. So these standards exist already or existed already, SVG and Canvas, and of course, the DOM API, as I mentioned. But in many ways, they were very tedious to use. And so the goal for D3 was to build on those technologies, like let you leverage all of those capabilities, but make it much easier for you to get started, make it much more, you know, performant, I guess, like maintain that performance capabilities, but make it easier for you to use it.

There's also an element to D3 that's about kind of all of the visualization techniques and just kind of packaging those up in a reusable way. So like the tree map, squarified tree map algorithm, for example, like that's fairly tedious to write yourself, like to read the paper and to implement it yourself. And there were some existing implementations of that that predate D3. But what I wanted to do with D3 is to try to think about like, what is the kind of purest encapsulation of that algorithm in a way that's independent of how you display it. So like the layout algorithms in D3 are all like data space. They're just like data in data out. They don't dictate how you display it, like whether you're using SVG or Canvas or WebGL or anything else, or even like React, whatever you want to do. So I want to try to like decompose it into these composable pieces. So that, and I think that helps contribute to its longevity.

I mean, maybe you had the same experience as me. It's like reading these viz papers would be like a cool visualization. And they provide software, but the software does that visualization and like nothing else. And then you're like, well, I'd like to combine it with this other visualization. And there's just like no like connection.

And they're really fun to work on as well. Like the circle packing algorithm, I think has been the most fun one that I worked on. And they have these like fun diagrams that show how they work, how they kind of build out these layouts progressively. And you can kind of work on them and you get these really satisfying animations as it's iterating over the layout. And yeah, like packaging them up so that it's easy for people to reuse that's not tied to like any implementation artifact that's in a paper has all of these kind of somewhat arbitrary choices of what language it's in, or what other kind of parameters or UI that's around it. And so it is fun as a kind of a software engineering puzzle to think about like, what is the most reusable version of this implementation? And in a way like that is kind of the art of open source is like, how do you take a complex problem and kind of pare it down to its essence so that people can then use your solution or use your tool in as many cases as possible.

The art of open source is like, how do you take a complex problem and kind of pare it down to its essence so that people can then use your solution or use your tool in as many cases as possible.

When you ask the agent to do something and, you know, it generates a bunch of code or whatever, it doesn't just assume that that code did exactly what it expected. Like it can actually inspect any of the declared top-level variables, as well as anything that was displayed, and it really changes the behavior of the agent.

And there's so many other examples like this where, you know, it's working with some dataset and it's just slightly different than it expected. And the fact that it can actually verify that and then correct for it makes it much more robust.

Have you seen our BluffBench? Simon Couch and Sarah Altman did this for us, where we kind of noticed like, the LLMs are like pretty lazy at reading the plots. And often they'll just effectively like read the axis labels and then based on the kind of like if you do a plot of fuel economy and then ask it like what's going on, it just tells you the expected relationship between engine size and fuel economy without actually looking at the plot. It can be very dangerous because it has that kind of innate understanding. And also I think they tend to be too optimistic in a sense where like either is you wrote the code and it assumes you know what you're doing or it wrote the code and it assumes it knows what it's doing. And so yeah, you do have to kind of nudge it to be a little bit more discerning in looking at that output because we found that even when it could see the output, like if we didn't tell it that it really needed to review it and make sure that it did what it expected, it wouldn't do well.