This past year has been an exciting one for CxAlloy TQ, with tremendous increases in users, projects, and activity.
One thing that comes along with a great year like we’ve had is a commensurate increase in the demand on our software infrastructure, and the possibility of performance problems. We hit those a couple of times this year, but we’ve worked hard to not only restore performance every time, but actually improve it. That story continues with our release this week that dramatically improves the performance of file uploads.
As part of this I also want to introduce to you our new status page, status.cxalloy.com, that will not only give you information for any planned or unplanned downtime or performance issues, but also be a public accounting of our commitment to great performance by showing you real-time and historical information on the performance of CxAlloy TQ.
The Challenge
The usage we are seeing today is significantly more than what we saw a year ago, or even a few months ago. This chart from a few months ago gives a good idea of the increases we’ve experienced.
June 2017 saw 80% more activity than June of 2016 and 50% more activity than January 2017, just six months earlier.
We are of course happy to see CxAlloy TQ grow, but it can create performance bottlenecks. We firmly believe that your experience shouldn’t be affected by the success of our platform.
To achieve our goal of great performance at all times, we have to look at many areas. For us our biggest bottlenecks are:
- Database access
- File uploads and downloads
- Mobile syncing
Consequently, we have made improvements in each of these areas.
Database Access
Databases are slippery beasts, and small choices can have huge impacts (both positive and negative). The first thing we did this year was work through all of our slow queries and implement optimizations. We are careful to optimize in ways that have only upside, as some optimizations can make one area better while making others far worse. We made huge improvements here, particularly with history and checklists and tests.
In addition, in late June we moved to a much more powerful database server (82 GB RAM!) with a newer version of MySQL to give us significantly more headroom than we arguably need. This allows us to all but eliminate hardware as a performance bottleneck.
File Uploads and Downloads
We’ve seen an incredible increase in file uploads since the turn of the year.
June 2017 file uploads were 4x more than February 2017.
Before our release this week, both file uploads and downloads passed through our servers. The more files, the more our servers have to do, and that additional work can make uploads slow and even impact the entire site.
One obvious solution to this increased volume is to deploy more servers. However, we thought there was an even smarter solution than that.
The key is to realize that ultimately these files are not stored on our own servers, but in Amazon’s cloud-based S3 service. We realized that if we could cut our servers out of the loop, so that you upload the files directly to their ultimate destination, and download them directly from there, then our servers wouldn’t have to handle that load at all. Uploads and downloads would be faster and the application would be faster.
The release this week does just that for uploads. Our internal testing shows uploads are often over three times faster due to this change.
By offloading that work to Amazon we benefit from their best-in-class capacity, speed, and reliability. And since we were already relying on these services before, this change is a clear win.
Mobile Syncing
Last year we dramatically increased the speed at which a project downloads to a mobile device by offloading much of the work to our servers. This change has created a huge benefit for our users over the old process. It works so well, in fact, that it highlighted another bottleneck: syncing.
Syncing is a complex process compared to the relatively straightforward project download. It takes significantly longer as well.
There isn’t a good way to offload this work to our servers like we did with project download. It has to be done on the device, which is sometimes subject to older hardware and poor network connectivity. Not ones to shy away from a challenge, however, we’ve been drafting plans to ease this sync burden.
Unlike the systemic change with project download, these changes are small and incremental improvements. We’ve already implemented optimizations on the server to increase the speed at which the server processes syncs. Future changes include:
- Doing many small syncs that happen periodically in the background, even when you aren’t using the app.
- For large syncs we will leverage our project download system to replace the stale data on the device with fresh, quickly-bundled data from the server.
Making Ourselves Accountable To You
One way we can ensure that we don’t step off the gas on performance improvements is to give you an unfiltered view into how well we’re doing. We decided to do just that: our new status page provides both real-time and historical data on application performance.
The Apdex graph shows both current (Day) and historical (Week and Month) performance.
This Application Performance graph uses the industry-standard Apdex metric to chart real-time performance. This simple metric measures how many requests are met under a defined time limit. We use a relatively standard value of 0.5 seconds as our benchmark.
This is a level of transparency we’ve never provided before, but I’m excited about it. It will drive us to be more disciplined in both maintaining and improving performance, and push us to do the the one thing that really matters: helping you to run your projects as effectively and efficiently as possible.