Blog: Technology and Operations - 2023 in Review

Technology Update 2023
Hi all,

It’s hard to believe that we’ve not even been at this for a full year, yet.

As we approach the end of 2023 it seems like a good time to take a look at what we’ve done on the Technology and Operations side of the co-op in order to keep the lights on and provide stable technical resources for our members.

It has been an eventful year, and I am proud of what we have accomplished!

I wanted to look back and take an opportunity to share some of the key milestones and challenges we’ve encountered this year.

Starting the Journey:

@boris kickstarted our Mastodon deployment with a one-click install from Digital Ocean back on December 11th, 2022.

Aside from some initial configuration and the setup of some founding accounts, not much happened until we started opening the doors to new members in April of 2023.

Boris and @timbray put out a call for volunteers to help run this thing and a few of us answered.

It’s a tricky thing to figure out how to be useful in a volunteer organization, and I am grateful to Boris for the guidance and encouragement he offered myself and our other TechOps volunteers from the very outset.

Mastering Mastodon:

Our first task was to learn the ins and outs of Mastodon.

The one-click install provided by Digital Ocean was a useful starting point for our little server, but anyone who runs applications on the Internet will tell you that as soon as the system is live, you’re engaged in an endless battle against entropy.

Things break. Things need to be (constantly) updated.

Mastodon is built in Ruby on Rails and is a complex bit of software.

Knowing how to tell what a system needs to keep running smoothly is job one when taking on any new operations task, so we set about figuring things out.

Improving Visibility:

Shortly after I joined the team, we had our first outage.

I’m not sure what caused it, it was resolved with a quick reboot, and we weren’t down for very long, but the incident revealed that we were missing some crucial information about the operation of the server.

One of the first things we set out to do was to cobble together some monitoring and alerting tools so that we’d be able to tell if something was going on that might compromise the availability or performance of the server.

This suite of tools has grown throughout the year, providing insight into Mastodon’s internals, and giving us better ways to understand the health of the overall system.

Working Together:

In April, we were joined by Roberto, whose early involvement has been a major contributor to our ongoing success.

Roberto is a Terraform genius, and kindly integrated our GitHub, Digital Ocean and AWS environments using Terraform. We’ve been building on this work ever since, and I wanted to give him a shoutout for this significant contribution to our operations.

The core tech-ops team has remained small throughout the year, but we’ve been recently joined by Gov and Ian, and we’re continuing to learn how to build on each other’s strengths.

We come from various IT backgrounds, no-one has a ton of Ruby on Rails experience, and unless you’ve run one of these systems before you probably don’t know anything about it.

We’re all learning about what we’re running as we’re running it, which is exciting and challenging.

I’ve greatly enjoyed working with these guys, and I am excited to grow our team and and our technology in the year ahead.

If you know your way around a command prompt at all and you’re interested in helping out you are qualified and welcome!

Please reach out to Boris or myself and we’ll figure out how to get you involved.

Making Space:

Mastodon is hungry for storage!

We opened the doors to co-op members in April, and by April 15th the one-click install Droplet was bursting at the seams.

Every bit of content from everyone followed by everyone on the server gets added to our media cache.

By mid-April we were fighting against the limits of our hardware on a daily basis.

Fortunately, Roberto had been running his own Mastodon instance for a while and knew how to get us set up with AWS S3 storage and their Cloudfront CDN.

On April 15th he completed this major maintenance and we gained the room we need to allow the server to grow (thanks Roberto!)

Moving the Database to a New Data Place:

According to the docs Mastodon can (relatively) easily recover from most types of failure, but:

Mastodon stores all the most important data in the PostgreSQL database. The loss of the PostgreSQL database will result in the complete failure of the server, including all the accounts, their posts and followers.

So this seems like something to avoid!

In order to make sure that this most crucial bit of our Mastodon instance survives - whatever else might happen to the server - we set about moving the database off of the one-click install and into a Managed PostgreSQL instance with Digital Ocean.

Figuring out how to make this work took quite a bit of prep and was complicated by some … less than ideal behaviour from the Digital Ocean managed PostgreSQL service.

In the early morning on August 27th the server was offline for 23 minutes while the data was dumped and restored in its new home.

I am proud to say that this maintenance was the only “major” downtime we’ve had all year!

With this maintenance activity completed our data is replicated in real-time and we are confident that we’ll survive any major server issues with minimal loss of data, and we’re well positioned to scale up the server for years to come.

Caring for Your Mastodon:

Mastodon is in active development, and between April 2023 to the end of the year, there were 13 software releases!

Most of these have been minor bug-fix releases, but they have included one severe security issue (which was deployed with in an hour of its release) and one major upgrade.

As a team we have deployed 10 of these releases to our instance, with Gov and Ian taking the lead in promoting of a couple of them.

The most notable change to Mastodon this year was the release of version 4.2.0, released in October, which introduced full-text search.

This major release required the setup of a new Elasticsearch cluster - along with the monitoring and alerting required of every new service to make sure that everything continues to work as expected.

Conclusion:

Looking back, 2023 has been a year of learning, growth, and building community.

From our initial, small, all-in-one deployment we’ve grown to a server that supports more than 100 active members and can support hundreds more.

Our small band of volunteers have kept things up and running smoothly and I am proud of the work we have accomplished together.

Each challenge we faced was an opportunity to improve, and every milestone a testament to our team’s dedication.

As we gear up for 2024, we’re excited to continue this journey and look forward to building reliable, safe and engaging space for our members.

3 Likes

Thanks @mick for stepping up and taking on lots of responsibility, including joining the board this year.

It’s been a very rewarding year of growing members at the co-op, and I’m looking forward to having more people collaborate with us all.

We’ve got a pretty good foundation tech wise, and I’d love to see more people join in documenting, experimenting, and onboarding.

I’m going to try and highlight and solicit ideas, and support small groups in taking them on in a way that engages existing members … and spurs new ones to join!

Great write up @mick! Happy new year all!

2 Likes

We should publish this on the cosocial.info blog?

1 Like

Thanks for the write up. Really interesting!

2 Likes