As btcd nears completion we decided to have at least one round of deslugging because we were much slower then bitcoind during chain download. Let me clarify that term for the ones that don’t know what that means. Deslugging is the art of measuring (or profiling) run times of individual functions, determining which ones are slugs and which ones could be optimized. Armed with that data one attacks said functions to see if there are some less then optimal algorithms. Surprisingly enough, this is often the case. An industry truism is “measuring is knowing” and attacking functions based on gut feeling often does not yield satisfying results. As always with these exercises, some things work and some don’t. Throughout this blog I’d like to walk our readers through some optimizations that succeeded, and one, objectively, failed attempt.
At first we created a profile of every single function during the initial block download. We found some very surprising results, there were 10 or so functions that were not expected at all to show up in the results and the kicker for this blog, database compacting and transaction lookups use a whopping 45% of the total runtime.
Since IO use was so high we decided to attack it. The idea was, let’s use a flat file database for transactions since appends are essentially free however at the cost of longer lookups and more complex code. The outcome wasn’t certain and we had some pretty vociferous debates. Opinions ranged from “50-60% improvement” to “it probably is a wash”. Despite having an uncertain outcome we decided to go ahead and do the experiment. The potential reward outweighed the development cost.
The proof of concept of this code was implemented over a couple of days and testing began. Initial tests were not very encouraging because we only saw a raw speedup of 10-15%. As expected lookups were not significantly faster either. Unfortunately in the available time we did not get the answer to two questions. Namely, what is the true performance impact of spinning disk vs memory/SSD and why is bitcoind faster than btcd on slower hardware. With these results in hand we had to make a decision. Are we moving forward with this code or shelve it until a later time. The decision was made to, for now, abandon the effort and concentrate on other things.
So with so much raw potential, why did we abandon this effort? The answer to that is multi pronged. Let me enumerate the major decision bullets:
- Writing the missing bits to harden this code is complex and time consuming. Making this code production ready would push back the feature complete date.
- The apparent speedup was relatively low on the expectation scale.
- Lookups are more expensive, however can result in less disk I/O.
- Dealing with corrupt journals/flat-file/database is not only complex it has the potential of a very negative user experience. If corruption of any sort is detected then the database components must be validated, this is inherent to the its size a very long operation.
- There were/are more optimizations possible before we need to reassess the performance profile of btcd.
We went ahead and deslugged a bunch of functions and we got some pretty substantial gains. We found functions where something as simple as generating unneeded garbage collection caused orders of magnitude of unneeded runtime to more elaborate fixes that required rethinking the algorithm. In total we shaved off about 3.5 hours out of 8 during the initial block download, up to the checkpoint, which brought us roughly on par with bitcoind. After the checkpoint btcd is much slower due to not having optimized crypto functionality. We will get to that once we reach feature parity. Once crypto optimizations are complete we will run a new round of profiling to see what needs to be deslugged.
What has not been mentioned throughout this blog is how incredibly useful and simple the Go profiling tools are. It goes without saying that the tools are what made deslugging possible in the first place. Even though most languages have profiling tools they are usually an afterthought and require gymnastics to get to work. As with just about anything Go profiling “just works” and requires almost no work on the part of the user. Go proves again to be substantially better than other languages when it comes to shortening development cycles because useful tools are integrated into the development environment.