2010-08-14 13:56:34

Video of my presentation from OSCON 2010

For those who are interested, we've posted the video of my presentation at OSCON on YouTube.

I had a few problems when displaying my slide deck at the conference. When I'm speaking at an event, I usually like to use whatever equipment is provided. To be assured of compatibility between my MacBook Pro and the projector, I would need to bring like [what seems like] 23 different video adapters. It's easier to just bring my slide deck on a thumb drive.

The email from the conference organizers told us there would be "Dell laptops" in the room. I remember thinking how boneheaded it was of them to be running Windows at the Open Source convention, but I complied and brought my slides as a PowerPoint file.

And then I got there and discovered that I was the one being a bonehead for assuming that "Dell laptop" == "Windows + Office". Actually, those Dell laptops were running Linux with OpenOffice.org. Anyway, OO.org imported my .pptx file, but it botched the formatting in some rather unexpected and entertaining ways.

Moving Forward

Since OSCON ended three weeks ago, folks on our team have been taking their summer vacations, but we've still made some good progress:

After hearing lots of (well deserved) complaints from people trying to build 64-bit Veracity, we expanded our continuous integration build farm to do both 32 and 64 bit builds, debug and release, on all our platforms.
We had just missed our goal of dogfooding Veracity's bug-tracking features before OSCON, but after another round of improvements to the Web UI stuff, now we're using Veracity not just for source control, but also for project tracking.
We implemented Mercurial-style version numbers. They're specific to one instance of a repo, but still kind of handy.
We started work on letting Veracity run through mainstream web servers (instead of only using its embedded web server).
We did lots of bug fixes, including some deep polishing and testing work on patterns for include/exclude settings.
I've been working in a private branch, focused mostly on improving performance:

Every changeset record has a blob list which is used for making things like push/pull and incremental indexing efficient. For changesets which are a DAG merge (more than one parent), we need to normalize that blob list to ensure that the exact same list is constructed on each side of the merge. Our previous normalization code was additive. It walked the DAG back to the lowest common ancestor and added any blob which wasn't present on both sides. Gradually, this caused those blob lists to keep getting bigger and bigger, which turned out to be a nasty performance probem that gets worse as the repo grows. So, I switched the normalization code to remove any blob which was present in the blob list of any ancestor. This is a lot harder to calculate, but it results in a much tighter list.
The changeset record for a database DAG includes a delta. When that changeset is a merge, the delta is calculated against the lowest common ancestor of the two parents. However, when it comes time to store that delta for later use by the indexing code, it would be better to calculate an equivalent delta against one of the two parents.
In a Veracity database, every record has two fields: recid and rectype. However, some our databases just don't need both of these fields. For example, recid is really only useful if you plan to modify records, but the audit DBs are filled with record that never get modified. Similarly, if a DB only has one record type, we don't need every single record to have a field reminding us what the name of that type is. So, I made a bunch of changes to allow a Veracity DB to exclude one or both of these fields. Eliminating the need to store, retrieve, index and obey these superfluous fields resulted in a nice perf increase.
I went through and made dozens of little optimizations in the indexer. Remember to always use SQLite's prepared statements in loops. Make sure every blob getting indexed only gets loaded once. Tune the hash table which represents JSON objects.
I found and fixed a few GC rooting bugs in our SpiderMonkey code. BTW, I can't wait until we can upgrade to new and improved version of the JS engine. I greatly dislike the fact that SpiderMonkey doesn't have a wider int.
Unfortunately, some of my changes break compatibility, so I've been writing a script to migrate all our data. This week I'll merge with the trunk and we'll do what we call a "repository reboot".
This firehose of detail is mostly just the ramblings of yet another blogger who is under the delusion that anybody cares about the mundane elements of his day. Which reminds me, Thursday morning for breakfast I had iced coffee with an omelet made of red peppers, Portobello mushrooms, and provolone. Anyway, on the off chance that anything here wants to get discussed, meet me on the Veracity mailing list.

After things settle down just a bit more, we'll be ready to start publishing nightly tarballs.