This is me.

Hello! I’m a student / mathematician / computer scientist doing research on the intersections between geometry and artificial intelligence. I am currently doing a PhD at Oxford; previously, I was the entrepreneurial lead for Categorical Informatics, a math+data company spun out of MIT.

To contact me, send me an email at joshua dot z dot tan at gmail dot com (remember the “z” in the middle, otherwise you’ll get someone different!).

Base change and entropy

Tom Leinster recently posed an interesting question in a talk at CLAP: “how do I generalize a theorem about objects into a theorem about maps?” The general idea comes from Grothendieck’s relative point of view, and to implement this point of view, one has to overcome certain technical hurdles related to “base change.” I thought I’d spend some time trying to lay out what it means to have a change of basis in algebraic geometry, and then how that idea shows up in Tom’s project: turning entropy into a functor.

You can read about Tom’s project (joint with John Baez and Tobiaz Fritz) directly here: https://ncatlab.org/johnbaez/show/Entropy+as+a+functor

(Currently writing this up, so excuse the notes below!)
Continue reading

The 57th Venice Biennale

Notwithstanding the petulant performers and the alcohol-fires, the German pavilion () was pretty fun. Anne Imhof’s work, a series of performances ranging under, over, and above a 3-foot raised platform of plexiglass that spanned the entire pavilion, was more impressive for its set design than its performance; more for stagecraft than for emotion. The piece was a victim of its own popularity; upon visiting all that I saw were other sightseers rolling like pinballs from one end of the room to the other, obscuring any sense of the (literally) underlying action. But it was pleasant, in a way, to see the pinballs rolling around the corners of the stage, and to meditate a bit on the nature of crowds.

And in the spirit of art that lives on walls, here is a lovely piece by Maria Lai, called Geografia.

Adventures in data warehousing

Data warehousing is exactly what it sounds like: create a central storage space for large amounts of data, so that it can be accessed by many different people and applications. I had an opportunity recently to work on a data warehouse, so I thought I’d write up a bit about the experience.

Here are three practical principles for data warehousing that everyone already knows.

#1: Model and plan, because if you build a warehouse for data that doesn’t exist (or connectors for sources that have changed), the project will fail.

#2: Talk to end-users, because if you build a warehouse that no one will use, the project will fail.

#3: Involve your sponsors and stakeholders, because if you don’t have the money to finish the warehouse, the project will fail.

Side note: compare these principles to the main technical challenge of data integration: brokering the data so that it can be consumed by the maximum number of downstream applications.

According to Gartner, over 50% of data warehousing projects fail or achieve only limited acceptance. A lot of them will fail even if you enact these best practices (and resolve the technical challenge). Running a warehousing project is a bit like crossing the Atlantic in a wooden boat; you can stock your larders, swab your decks, and keep fresh lookouts, but at any time an angry, CEO-faced storm could blow you into the sea. Whether you’re a data warehousing veteran or a poor schmuck impressed into a warehousing team, best practices can only go so far. So what do you do?

A useful exercise?

Best practices are all well and good, but it’s hard, in the middle of a project, to see their relevance. So let’s go through and try to apply those three principles above in an actual example.

Currently, I’m part of a group that is installing an Internet-of-Things (IoT) testbed for the city of Boston, and my job is to create a database for the data (and metadata) we’ll be receiving from the sensors as well as any applications that will be running off the sensor data. In theory, that means I should

#1: Model and plan, but… I have no idea what sensors I’ll have, where the sensors will be located, what kind of co-dependencies (foreign keys) there might be, or what kind of data they’re going to provide. The sensors could range anywhere from light-field cameras to soil-moisture sensors. Maybe the data will be incredibly simple: one table, no foreign keys, just a big global relation. In that case, we don’t really need to worry about the structure of the warehouse too much, just the access portion [update: this was in fact the case]. On the plus side, these are sensors (as opposed to transactional database systems) so I shouldn’t have to perform complex procedural transformations on the data values before loading them into the warehouse. On the other hand, I might have to warehouse a bunch of application-generated data [update: this was not the case].

#2: Talk to end-users, but… I don’t know who’s going to be using this data; the list of potential users is extremely broad. The flip side of this question is that I don’t know if this data is useful at all, since I don’t have a list of potential users! To be fair, the main goal of this project is to figure out what that list of potential users would be. Maybe some of these “users” will be mash-ups and visualizations meant to be displayed at the testbed itself. Some of my co-organizers want the data warehouse, or some version of it, to be opened up to the public;but why does that need a warehouse, as opposed to a big flat file dump? Others want it to be tailored to “city planners”. Still other have in mind startups, or the local business community.

#3: Involve your sponsors and stakeholders, but… we’re doing data exploration, so it’s unclear who actually “has a stake” (again, part of the project is to find a list of potential sponsors and stakeholders) besides the City of Boston. But the City of Boston doesn’t know what they want to do with the data—they seem more interested in things like best-practices (e.g., things like this list!).

Going through the exercise, it seems pretty clear that what I should be worrying about is not the technical aspects of building and structuring the data warehouse, but the lack of clearly-specified users!

Discussions at CCT

We just officially ended the inaugural Computational Category Theory workshop at the National Institute for Standards and Technology (NIST). During the workshop the participants had five discussions, on

  • algorithms for category theory,
  • data structures for category theory,
  • applied category theory (ACT),
  • building the ACT community,
  • and open problems in the field.

Below, I’ve written up a partial summary of these discussions.

Continue reading

The 56th Venice Biennale

At the entrance to the central pavilion of the 56th Venice Biennale is a restorer’s ladder—three stories tall, made of two long staves of veined wood and girded like a construction crane with two lattices of peeled iron. The ladder (Fabio Mauri, “Macchina per fissure acquerelli”) reaches up and back to Galileo Chini’s painted dome, first erected in 1909 for the 8th Biennale. The ladder is a gesture; it feels and looks temporary, yet it has all the tart flavor of a Lichtenstein one-liner. The ladder is telescoping upward and backward to some imagined beginning.

This year’s theme is “All The World’s Futures”. What a hopeful title. The future is associated with kids (America), robots (Belgium), and chrome skinsuits (South Korea), so all the futures can only mean all the kids, all the robots, and all the metal eyeshadow. The theme, whatever it’s supposed to mean, functions as a trick. The more you stare at some icon of the future hanging or projected on the beige prop wall, the more you feel like you are being dragged relentlessly into some regressive French movie about the American 80’s, like you have opened the door to a dour, fat-faced salesman trying to sell you on the next new gospel. The future has never seemed so off-kilter, so imbecile.

Art, like bread, is meant to be consumed. Art fills you, it soaks up excess alcohol, and it makes you sick if you consume too much. The Biennale is a feast. You walk around, and there are Adrian Pipers to enjoy, Young British Artists to mock, an epically boring live reading of Das Kapital, tourists going on benders with selfie-sticks, Venetians glowering in the backlight. Some of the pavilions were atrocious; some of them were sublime. The German pavilion was a hot mess of hipster lawn art and commercials for video games I would never play. At the Japanese pavilion, Chiharu Shiota somehow both submerged and elevated the entire exhibit under a skein of red thread, keys, and sunken boats, creating another horizon where heaven meets the sea. The Norwegian pavilion was anomalous, architectural, modern, and striking. The French pavilion, like the Belgian, has robots. You can walk through the entire exhibition hall, from the Giardini through the Arsenale, and find good art, bad art, blameable art, art which is forgivable because it is, after all, only art. What you will not find is art that gives you hope for the future. Art—fine art, collected art, curated art—does not belong to the future, at least not as robots belong to the future.

Art is a sideshow to progress.

(to be continued…)

Formal concepts and natural languages

Back in January, Yiannis (Vlassopoulos) and I were talking about “quadratic relations” and higher concepts in language, for example the analogy between “(king, queen)” and “(man, woman)”. Deep neural networks can learn such relations from a set of natural language texts, called a corpus. But there are other ways of learning such relations:

Representation of corpus How to learn / compute
n-grams string matching
word embeddings standard algorithms, e.g. neural nets
presyntactic category computational category theory
bar construction Koszul dual
formal concept lattice TBD

There are some very nice connections between all five representations. In a way, they’re all struggling to get away from the raw, syntactic, “1-dimensional” data of word co-location to something higher-order, something semantic. (For example, “royal + woman = queen” is semantic; “royal + woman = royal woman” is not.) I’d like to tell a bit of that story here.

Continue reading

Operads and subsumption

After seeing David Spivak’s talk on operads for design at FMCS 2015, I immediately thought of Brooks’ subsumption architecture. The subsumption architecture was one of the first formalisms for programming mobile robots—simple, insect-like robots capable of feeling their way around without needing to plan or learn. Operads, on the other hand, are certain objects in category theory used to model “modularity”, e.g. situations where multiple things of a sort can be combined to form a single thing of the same sort.

I’d like to formalize subsumption using operads.

But why would anyone want to formalize an derelict robotics architecture with high-falutin’ mathematics? 

The answer is simple. It’s not that subsumption on its own is important (though it is) or that it requires formalizing (though it does). What I’d really like to understand is how operads give domain-specific languages (and probably much more) and whether categories are the right way to pose problems that involve combining and stacking many such DSLs—think of a robot that can move, plan, and learn all at the same time—which, for lack of a better term, I will call hard integration problems.

(The rest of this post is currently in process! I will come back throughout the fall and update it.)

Continue reading

UX, experiments, and real mathematics, part 1

Back when I was a rube just starting to learn algebraic topology, I started thinking about a unifying platform for mathematics research, a sort of dynamical Wikipedia for math that would converge, given good data, to some global “truth”. (My model: unification programs in theoretical and experimental physics.) The reason was simple—what I really wanted was a unifying platform for AI research, but AI was way, way too hard. I didn’t have a formal language, I didn’t have a type system or a consistent ontology between experiments, I didn’t have good correspondences or representation theorems between different branches of AI, and I certainly didn’t have category theory. Instinctively, I felt it would be easier to start with math. In my gut, I felt that any kind of “unifying” platform had to start with math.

Recently I met some people who have also been thinking about different variants of a “Wikipedia for math” and, more generally, about tools for mathematicians like visualizations, databases, and proof assistants. People are coming together; a context is emerging; it feels like the time is ripe for something good! So I thought I’d dust off my old notes and see if I can build some momentum around these ideas.

  • In part 1, examples, examples, examples. I will discuss type systems for wikis, Florian Rabe’s “module system for mathematical theories“, Carette and O’Connor’s work on theory presentation combinators, and the pro/con of a scalable “library” of mathematics.
  • In part 2, I’d like to understand what kind of theoretical foundation would be needed for an attack on mathematical pragmatics (a.k.a. “real mathematics“) and check whether homotopy type theory could be a good candidate.
  • In part 3, I will talk about mathematical experiments (everything we love about examples, done fancier!), their relationship with “data”, and what they can do for the working mathematician.

Continue reading

Time-series and persistence

Recently, I’ve been working on a project to apply persistent homology to neural spike train data, with the particular goal of seeing whether this technique can reveal “low frequency” relationships and co-firing patterns that standard dimensionality reduction methods like PCA have been throwing away. For example, some neurons fire at a very Hz, around ~75-100 Hz, while in fact most neurons fire at ~10-30 Hz in response to some stimulus. The loud, shouty neurons are drowning out the quiet, contemplative ones! What do the quiet neurons know? More to the point, how do I get it out of them? Continue reading