Loading...
Arrow left
Blog Homepage

Our Journey To Make ZenHub Fast

Our Journey To Make ZenHub Fast

At ZenHub, we regularly conduct customer satisfaction surveys of our cloud users to get a better insight into how our users are using the app. The feedback from these surveys gives a good look at what pain-points our users have and how we can ultimately make ZenHub better.

We ask a range of questions around user experience, quality of the product, and overall satisfaction of using ZenHub on a regular basis. The results of our 2019 Q4 survey reiterated a common trend we’ve been hearing from our users — “ZenHub is slow”.

mindmap

More than 40% of our users reported some level of dissatisfaction with the speed of the product. In fact, performance has come up as a recurring theme from our users since we first started collecting this data back in 2018. We want to assure you that we haven’t been ignoring this feedback. We’ve been hard at work on a brand new version of ZenHub to make it faster than ever. Today we’d like to pull the curtain back and do a technical deep-dive on how we’re going to address performance and make ZenHub the fastest project management solution on the planet.

How satisfied are you with the performance of our software:

Performance
ZenHub’s CSAT 2019 survey results for performance satisfaction

ZenHub + GitHub = ❤️

Before we talk about our solution, let’s first understand the unique problems that ZenHub faces as a product. What makes ZenHub wonderful is that it integrates deeply into GitHub, allowing developers and product managers to stay in-sync without ever having to leave the place where all the work happens — that’s GitHub.

This is truly a deep integration and a close partnership.

Not only does ZenHub consume a tremendous amount of data through GitHub’s API but ZenHub also embeds itself seamlessly into GitHub web application UI through an optional browser extension.

estimating issues
ZenHub’s browser extensions allow you to set estimates on GitHub issues.

GitHub ultimately controls the source of truth. Their database manages issues, users and status activity data. Data that we desperately need access to. This means that when ZenHub asks GitHub for the latest status of your issues, we are not only bound by the strict rate limiting that GitHub’s API enforces on us, but also by the permission rules defined for your user. We can’t simply assume that you have access to a particular piece of data, unless GitHub can first confirm and validate that access (we take security very seriously too!). ZenHub has over 100,000 active users, so as you can imagine — we ask GitHub a lot of questions, a lot of the time. About 2,500 requests per minute (rpm) to be exact.

ZenHub originally solved this challenge by making many of those requests directly from the ZenHub client application to GitHub. This allowed us to spread the requests across multiple unique devices, thereby staying well within the limitations of the API.

GitHub data
Version 1 of ZenHub would request data from GitHub directly from our client applications.

However, this approach meant that our client application had to do a lot of work. It had to track, store and manage access to all the issue data in the browser. This used a lot of internal system memory for our client app users and it meant we couldn’t leverage powerful database features for data-heavy operations like search and filtering. Even today, the ZenHub client app downloads all of your board’s issues at the start and does filtering and searching in the browser. This isn’t good for performance.

Over the past few years, we’ve been slowly rolling out our own ZenHub API service that allows us to move a lot of that heavy lifting to our cloud servers and make use of a centralized database cache. This allows our clients to be more effective about data access, but how do we get around the rate-limiting problem? In short — it’s tricky, but we manage it by spinning up a large number of worker nodes to continuously fetch data from GitHub behind the scenes.

ZenHub Database Cache
In subsequent iterations, we added an internal cache of GitHub data that would allow us to be more efficient about our requests.

This solution gave us a robust framework on top of which we could build amazing features such as Workspaces and a lot of our reports. Unfortunately, the first implementation of our database cache still relied heavily on making on-demand requests to GitHub, which meant it wasn’t an efficient system for doing searching and filtering on a large number of issues. We also underestimated how successful ZenHub would become, with thousands of users making simultaneous board and issues edits in real-time. Last year we started working on a new backend engine to address this limitation and pave the way for a brighter future.

Caching & Databases

In order for ZenHub to scale to support the ever-growing list of our customers and an ever-growing demand for larger repositories, larger teams and larger numbers of issues - we knew we needed to design an architecture that was built with scalability in mind while ensuring we continue to adhere to the strict limitations of our partner APIs.

Our previous architecture suffered from a fundamental flaw. Although we were able to streamline a lot of requests to GitHub, those requests would still often require a full roundtrip to the GitHub API to check for user permissions and other updates before we could show the users any meaningful data.

ZenHun API
Some of our requests can return immediate cached data, but many still have a make a roundtrip request to GitHub API — which makes it very slow.

With our new backend, internally code named Raptor, we’ve found a way to address this limitation. Raptor monitors GitHub independently of any requests coming from the ZenHub client, and it maintains a complete cache of all necessary data in our database.

This allows us to have much more separation between GitHub and ZenHub and rely almost entirely on ZenHub serving data to our users. For on-demand requests, we only reach out to GitHub when absolutely necessary. Otherwise, we monitor GitHub mostly in the background without directly impacting requests made by our users.

ZenHub Client
Raptor works tirelessly in the background keeping our data cache in-sync so that normal client requests are much faster.

More than that, Raptor is specifically optimized for storing large amount of indexed data which allows us to completely remove the need for the client applications to maintain a local copy of all issues. Instead of doing all that heavy lifting on the browser, we can let our database handle expensive queries for filtering and searching — databases are really good at that sort of thing!

Let’s Talk Nerdy

If you’re a software engineer you might be curious about what tech is powering Raptor and how we made those decisions. The vast majority of ZenHub’s core is currently written in Node.js with MongoDB as our primary data store. With Raptor we’re introducing Ruby on Rails to our tech stack and moving all of our data to PostgreSQL.

The decision to move everything to PostgreSQL was a no-brainer for us. Issue and task management data is highly relational and hierarchical. Its schema also changes very rarely and the data structures are very predictable.

Although MongoDB was a great initial choice for us due to its flexibility and its ease of use, as ZenHub’s features became more rich and complex we needed a database that could help us be more strict with our schemas. PostgreSQL is battle-tested, highly reliable and very performant. We’ve been very happy with this choice.

Our decision to migrate to Rails needs a bit more context. This decision was driven primarily by two factors: 1) our engineers had prior experience with Rails and had the confidence to build production-ready software quickly and efficiently, and 2) we wanted to use a technology that we could trust and rely on when it comes to stability. Rails has a long-standing reputation when it comes to maturity and this was very important to us as we didn’t want to reinvent the wheel. The changes we have to make to our backend are very significant. We used this opportunity to review our tech stack and evaluate whether or not we were using the right tools for the job. Rails works really well with SQL databases and structured data, and Ruby’s focus on being developer-friendly made it super easy for us to stand up an initial MVP very quickly. Rails also has a large ecosystem of gems that have solved generic problems so we could focus on writing code for core business logic.

Additionally, both the Ruby and Rails communities have always followed good testing practices and so the popular testing tools (RSpec, FactoryBot, Capybara) make writing any level of test easy. Raptor has had 99% test coverage since it has started, which gives us the confidence to automatically deploy any push to the master branch straight to production. Finally, GitHub itself is primarily written in Rails, and it’s no secret that we’re huge fans of GitHub and their team.

Give It To Me!

You might be thinking, “This all sounds great! When can I get my hands on it?”.

We have great news! Raptor is actually already in production today for our Cloud users and will be included in the ZenHub Enterprise version 3 release. The Roadmaps feature which we launched on Cloud in October is powered mostly by our Raptor backend. Roadmaps are currently used by thousands of our users on a daily basis, which gives us a lot of confidence in the scalability and stability of the new system.

ZenHub Roadmaps
ZenHub Roadmaps

Migrating all of ZenHub’s features to Raptor in one-shot was out of the question for the engineering team. It would have taken a very long time to migrate all the existing features. Additionally, doing a single big cut-over would be very risky. Instead, we looked at all of ZenHub’s features, what data they rely on, and compared it with what new features we wanted to deliver.

Roadmaps was at the top of our list of priorities, and it relied on both epic and estimate data, so the start of that migration effort involved migrating epic and estimate data from the existing backend to Raptor. After that was done we built Roadmaps on top of it. Today, we’re working on migrating Board related data (pipelines, priorities, PR links, etc) so we can more easily build new Board features and also start trying some more aggressive performance optimizations.

We’ve been moving many of ZenHub’s features to leverage the new technology in the background over the last few months. Some of them we mentioned in our March Confetti Moments which describes all of the various performance improvements we’ve made in ZenHub over the last few months.

One example of this affects users who create workspaces with many connected repositories. Such workspaces were typically very slow to fetch the list of labels (for bulk editing actions) as we’d have to reach out to GitHub for each repository to fetch the list of latest labels. By switching over to using Raptor’s new cache, we’ve been able to greatly improve the initial load time of that list. As a result, we’ve seen the load time for those actions fall from an average of 600ms to less than 200ms, and it’s still dropping as more of our clients upgrade to the latest version of the app.

performance progression
Performance of the “labels” dropdown in the multi-action toolbar on the ZenHub board (lower is faster).

What Next?

More recently, in early April we’re going to start deploying a migration of our core Board features over to support Raptor. This is an important milestone for our team and it unlocks the possibility for truly progressive board loading, search, pagination of issue data, and a lot of other amazing performance changes that we’ll be rolling out in Q2 of 2020.

performance metrics
ZenHub’s internal performance metrics for board load time. Currently most boards load in about 4000ms (4 seconds).

Internally we keep a close eye on the performance of our features through metrics that we collect from our Cloud users. We track common operations such as how long it takes for the board to load, how long it takes to launch issue cards, how long it takes to load a list of assignable users,… Today, the average user has to wait about 4 seconds before they can see their board on the screen. We’ve set an internal goal for ourselves to improve this by 2x and get boards loading in under 2 seconds.

We’ll continue to provide our users with updates on this effort as we make progress on this goal. You can always find information about the work that we’re planning on our ProductBoard and you can get details on the features we’ve released on our changelog.

Software Development
Newsletter Icon