Engineering practice Multi-vertical

GitLab: from 15.4 to 18.11 over two years, zero data loss

A studio's self-hosted GitLab, stuck on 15.4: two major versions and two PostgreSQL migrations in 5 hours, patches on release day, zero data loss in two years.

10h delivered

The scary part of self-hosted GitLab isn’t the upgrades. It’s what sits inside: a company’s entire codebase, every client project, every line of history. Upgrading feels risky, so instances sit on old versions for years and pile up vulnerabilities. The studio’s GitLab was stuck on 15.4: by the summer of 2024 that was already two major versions behind. We ran it through the chain of mandatory intermediate releases and two PostgreSQL migrations in 5 hours, then kept it on fresh patches for two years, with critical ones applied on release day. In that time there was one night-time failure. It’s below, uncut.

Snapshot


End-client sector	digital agency, website development
End client	Media Studio (code-name), Russia
Engagement	DevOps retainer covering a fleet of ~14 servers
Project type	maintaining self-hosted GitLab: one big migration + routine and emergency updates
Work done	chain 15.4.2 → 17.2.1 with PostgreSQL 12 → 14; then 9+ updates, 3 of them critical security patches
Project date	10 Jun 2024 – 29 May 2026 (718 days)
Effort	~10h over two years (the big migration: 5h)
Team	3 specialists (engineer · sysadmin-engineer · project manager)
Tech stack	GitLab EE · PostgreSQL · CentOS 7 · AlmaLinux · Ubuntu 24 · Zabbix
Delivered	GitLab 18.11.4, PostgreSQL 14.11, zero lost repositories across the whole period

The problem

The studio runs dozens of client projects, and all the code lives in its own GitLab on a dedicated VDS. When we came onto support, the instance was on version 15.4.2. The client’s wording in the tracker: “gitlab is installed and hasn’t been updated in a long time. Need to update it to the latest version to reduce the risk of a breach.”

A request that’s simple in words and awkward in practice: update to the latest version without stopping work. The developers push every day, so any windows fall only after 18:00 Moscow time or on weekends, with the team warned.

What’s at stake here any CTO reads without translation: GitLab isn’t a service that “goes down and comes back up.” It’s the agency’s whole codebase. GitLab can’t jump straight from 15.4 to current: you have to walk the chain of mandatory intermediate releases. Along the way PostgreSQL migrates twice: 12 → 13 → 14. Every step means restarts, schema migrations, and a chance of ending up with an instance that no longer starts. There’s no test copy: nowhere to rehearse, no room to be wrong.

How we did it

1. A route, not a leap. Reconnaissance first: gitlab-rake gitlab:check, a cross-check against the official upgrade path, questions to the client about runners and integrations (the project kept no CI runners of its own, one risk fewer). The plan was recorded in the tracker before work started: intermediate stops 15.4.6 → 15.11.13 → 16.x → 17.2.1, PostgreSQL migrations at known steps. The client confirmed the scope in writing: “we update to the latest available version, including postgres.”

2. Two safety nets, not one. Before the start, a backup with GitLab’s built-in tooling: 12 GB, plus a VM snapshot at the host. A detail surfaced: the host panel had a snapshot checkbox but no rollback button. We learned that on dry land, not in the middle of a failure. The first, riskiest transition we deliberately moved to a bigger window: “I’ll over-insure. In case it doesn’t come up.” We also set up a package mirror so updates would install without surprises from unavailable repositories.

3. Data checks through the client’s eyes at every step. After each rung we asked the team to verify what mattered to them: login, creating a project, git pull and push. The engineer warned honestly that the thinnest moment would be the jump from the 15.x branch to 16.x. The client’s answer after the first rungs: “no complaints about working with gitlab from anyone so far.” Downtime on restarts ran to 1–3 minutes, and monitoring logged each one.

4. Security patches on release day, on our initiative. After that the work turned routine, and in routine the thing that matters is speed. 17.3.3: the task was opened and closed the same day, 0.5h. 17.4.2: “critical. there’s literally a message in git telling you to update urgently.” We agreed a window after 18:00, done by evening, 0.25h. 17.7.7: “because of vulnerabilities,” 0.5h. All three times the client didn’t ask: the studio monitors the announcements itself and comes with a ready proposal. The planned versions (17.5.1, 17.6.1, 17.7.6) rode in with the monthly fleet update cycles.

5. The ceiling: in writing, in advance, with options. In March 2025 the engineer noted it right in the task: “neither 17.8 nor 17.9 exist for centos 7. when the 17.7.x line runs out, we’ll need to look at converting to AlmaLinux 8-9 or standing up a new server.” No drama, just a fact with a fork and time to think. In November we added a reference from practice: at another of the studio’s clients with the same configuration, converting to Alma along with the GitLab upgrade took 4 hours across 4 mandatory intermediate stops.

The night the server didn’t come back from the reboot

Back in July 2024, while agreeing the update protocol, the engineer wrote to the client: “with any upgrade there’s always a chance the server doesn’t come back from the reboot.” So every window needs someone on duty with console access. A year and a half later the line came true word for word.

The night of 29 January 2026, a planned monthly update window. The engineer decides to clear the GitLab debt at the same time: convert CentOS 7 → AlmaLinux 8, then upgrade to the current version on top. Timeline from the chat (Moscow time):

03:29. Monitoring: the main GitLab isn’t responding, “No route to host.”
03:50. The engineer in chat: “the server didn’t come back from the reboot… need console access or at least a screen.” And right after: “don’t pull any reboots/resets.”
03:55. The client’s direction lead is already at the console: the disk is full to the brim. The engineer: “but I checked, there were 20 gigs free,” and the df from his command history backs it up. The conversion itself ate the space.
03:57. The call: Ctrl-Alt-Del through the console, no hard reset.
03:58. The server came up, GitLab responds. From the first alert to recovery: 29 minutes.

Data intact, the conversion didn’t go through, GitLab stayed on its previous version. The engineer finished the monthly update of the remaining servers that same night, by 04:22.

What came next matters more than the failure itself: the review. The client fairly asked, “we hadn’t agreed on a conversion yet.” It turned out the sign-off had fallen apart on a chat quote: the engineer had asked about the GitLab upgrade, and the client answered “noted, thanks” to a line he read as a reminder about the monthly cycle. No one lied: the two sides read one conversation two different ways. The engineer closed the topic without excuses: “got it, I’ll be clearer.” The client did the same in reverse: “better to spell out what we’re talking about.” The cost of the lesson: 29 minutes of night-time downtime and 1.5h of work. Since then risky work gets called by its name in chat, not guessed from context.

The finale: the move done by the client, the support ours

After that night no one tried the in-place conversion again: the new-server option won. In April 2026 the client’s team moved GitLab to a fresh server on Ubuntu 24 by themselves and upgraded it. That’s their work, and we don’t claim it as ours. What’s telling is the other part: across two years of running it together, the client’s team had watched the process enough to handle the move without us. We carried on the support on the new host and in May 2026 upgraded GitLab to 18.11.4 in 0.75h, in the shared cycle with the rest of the fleet.

Results

Metric	Value
GitLab version	15.4.2-ee → 18.11.4
PostgreSQL	12.10 → 14.11 (two migrations along the way)
Big migration 15.4 → 17.2	5h of work, planned windows of 1–3 min
Critical security patches	3 in the period, each on release day, 0.25–0.5h
Incidents in two years	1 (OS conversion), recovery in 29 min, data intact
Repository / data loss	0 — no developer complaints logged at any step
Effort on GitLab across the whole period	~10h

In short: a system no one dared touch for two years went through two major versions, two database migrations, three emergency patches, and one openly reviewed failure, and every working day the studio’s developers pushed code as usual.

Process

Phase	Period	Outcome
Reconnaissance and migration plan	June 2024	upgrade path recorded in the tracker, scope agreed in writing
Big migration 15.4.2 → 17.2.1	July 2024	walked in rungs over 5h, PostgreSQL 12 → 14, client checks at every step
Routine + emergency updates	September 2024 – December 2025	17.3.3 → 17.7.7; critical patches on release day; CentOS 7 ceiling recorded in writing
OS conversion incident	January 2026	recovery in 29 min, communication review, in-place conversion dropped
Move to Ubuntu 24	April 2026	done by the client; studio supports on the new host
Current state	May 2026	GitLab 18.11.4, in the shared monthly update cycle

The phases overlap with the rest of the retainer: GitLab is one direction inside the overall support, not a separate contract.

Team

Engineer (studio): the big migration 15.4 → 17.2, the PostgreSQL migrations
Sysadmin-engineer (studio): routine and emergency updates, the incident and recovery
Anton Hersun, Xaver Pro — project manager

If your self-hosted GitLab is also stuck on a version you’re afraid to touch, write and tell us which one it’s on now and which OS. We’ll cross-check the upgrade path, find the choke points like PostgreSQL migrations and the distro ceiling, and come back with a fixed estimate in hours. The upgrade-plan review is free.

Unfreeze your GitLab →