Last month, I found myself tasked with setting up a production-grade MongoDB cluster for our growing application. After weeks of testing, tweaking, and occasionally pulling my hair out, I’ve got a rock-solid setup that I want to share. This isn’t just another tutorial - it’s my battle-tested approach that’s currently running in production.
Starting With the Basics: Understanding MongoDB Clusters Link to heading
Before I dive into the setup, let me share what I learned about MongoDB clustering options. When I started this project, I had to make a crucial decision between replica sets and sharded clusters. Here’s what my research and experience taught me:
Replica Sets: The High-Availability Solution Link to heading
Think of a replica set as your database’s insurance policy. It’s like having multiple copies of your data, each ready to step in if something goes wrong. Here’s what I love about replica sets:
- It’s essentially MongoDB’s way of saying “I’ve got your back” - if your primary server fails, another one takes over automatically
- Your application keeps running even if a server decides to take a vacation
- You can spread your read operations across multiple servers (this saved us during high-traffic periods)
- It’s significantly easier to manage compared to sharded clusters
Sharded Clusters: The Scale-Out Beast Link to heading
Now, sharded clusters are a different animal altogether. Imagine splitting your data across multiple servers, each handling its own piece of the puzzle. Here’s when you might want to go this route:
- Your data is growing faster than your biggest server can handle
- You need to write data faster than a single server can manage
- You want to spread your data across different geographical locations
- You’re dealing with truly massive datasets (we’re talking terabytes)
I chose a replica set for our setup because our data size was manageable, but we needed rock-solid reliability. Let me walk you through exactly how I set it up.
My Production Setup Journey Link to heading
The Hardware Foundation Link to heading
I went with three Hetzner dedicated servers. Why Hetzner? Good price-to-performance ratio and reliable network - pretty crucial when you’re building a distributed system. Each server is identical, which makes management way simpler.
Setting Up the File Structure Link to heading
First thing I did was create a clean workspace on each server:
mkdir -p ~/mongodb
cd ~/mongodb
Creating a Production-Grade Docker Setup Link to heading
Here’s the docker-compose.yml
I landed on after several iterations. I’ll explain the important bits:
services:
mongo:
image: mongo:7.0
command: ["mongod", "--config", "/etc/mongod.conf", "--replSet", "rs0", "--bind_ip_all"]
ports:
- 27017:27017
environment:
MONGO_INITDB_ROOT_USERNAME: root
MONGO_INITDB_ROOT_PASSWORD: your_secure_password
MONGO_INITDB_DATABASE: admin
volumes:
- mongo_data:/data/db
- mongo_config:/data/configdb
- ./mongod.conf:/etc/mongod.conf:ro
- ./mongodb-keyfile:/data/configdb/mongodb-keyfile:ro
- mongo_logs:/var/log/mongodb
user: "999:999"
ulimits:
nofile:
soft: 1048576
hard: 1048576
nproc:
soft: 1048576
hard: 1048576
memlock:
soft: -1
hard: -1
deploy:
resources:
limits:
cpus: '75'
memory: 110G
reservations:
cpus: '75'
memory: 110G
sysctls:
net.core.somaxconn: 65535
net.ipv4.tcp_max_syn_backlog: 65535
net.ipv4.tcp_fin_timeout: 30
net.ipv4.tcp_keepalive_time: 300
net.ipv4.tcp_keepalive_intvl: 30
net.ipv4.tcp_keepalive_probes: 5
networks:
- mongodb_network
networks:
mongodb_network:
driver: bridge
driver_opts:
com.docker.network.driver.mtu: 9000
volumes:
mongo_data:
mongo_config:
mongo_logs:
Those resource limits? They’re not random numbers. I spent time monitoring our application’s behavior and adjusted them based on real usage patterns.
MongoDB Configuration That Actually Works Link to heading
Here’s my mongod.conf
file, battle-tested in production:
net:
port: 27017
bindIp: 0.0.0.0
maxIncomingConnections: 300000
security:
authorization: enabled
keyFile: /data/configdb/mongodb-keyfile
replication:
replSetName: rs0
storage:
wiredTiger:
engineConfig:
cacheSizeGB: 100
journalCompressor: zstd
collectionConfig:
blockCompressor: zstd
operationProfiling:
mode: off
setParameter:
maxTransactionLockRequestTimeoutMillis: 5000
transactionLifetimeLimitSeconds: 60
systemLog:
destination: file
path: "/var/log/mongodb/mongod.log"
logAppend: true
processManagement:
fork: false
timeZoneInfo: /usr/share/zoneinfo
That 100GB cache size? It’s not arbitrary - it’s about 80% of our available RAM, which is MongoDB’s sweet spot for performance.
Security: Because Sleep is Nice Link to heading
Security was a major concern. I generated a keyfile for internal authentication:
openssl rand -base64 756 > mongodb-keyfile
chmod 400 mongodb-keyfile
This keyfile acts like a shared secret between our MongoDB instances. Without it, random MongoDB instances can’t join our replica set. It’s like having a secret handshake. 🤝 One time generated keyfile should be copied to all the servers.
DNS Setup: Making It All Connect Link to heading
Setting up proper DNS records is crucial for MongoDB cluster operation. I’ll walk through how to configure both A records and SRV records in Cloudflare (the process is similar for other DNS providers).
Setting up A Records Link to heading
First, we need to create A records for each MongoDB node:
- Log into your DNS provider (Cloudflare in our case)
- Create the following A records:
# Primary Node Type: A Name: mongo1 Content: <Your-Server-1-IP> TTL: Auto Proxy status: DNS only # Secondary Node 1 Type: A Name: mongo2 Content: <Your-Server-2-IP> TTL: Auto Proxy status: DNS only # Secondary Node 3 Type: A Name: mongo3 Content: <Your-Server-3-IP> TTL: Auto Proxy status: DNS only
Important: Set Proxy status to “DNS only” (gray cloud in Cloudflare) rather than proxied. MongoDB nodes should connect directly to each other.
Setting up SRV Records Link to heading
Next, create SRV records for automatic MongoDB discovery. The SRV records help MongoDB drivers automatically discover all nodes in your cluster:
- Create SRV records for each node:
# Primary Node SRV Record Type: SRV Name: _mongodb._tcp.db Target: mongo1.yourdomain.com Priority: 0 Weight: 5 Port: 27017 TTL: Auto # Secondary Node 1 SRV Record Type: SRV Name: _mongodb._tcp.db Target: mongo2.yourdomain.com Priority: 0 Weight: 5 Port: 27017 TTL: Auto # Secondary Node 2 SRV Record Type: SRV Name: _mongodb._tcp.db Target: mongo3.yourdomain.com Priority: 0 Weight: 5 Port: 27017 TTL: Auto
Record Details Explained Link to heading
-
A Records:
- Point directly to your server IP addresses
- Enable direct node-to-node communication
- Should be DNS-only (not proxied) for proper cluster operation
-
SRV Records:
_mongodb._tcp
prefix is required for MongoDB service discoverydb
subdomain can be whatever you choosePriority
of 0 means equal priority for all nodesWeight
of 5 ensures equal load distributionPort
27017 is MongoDB’s default port
Testing DNS Configuration Link to heading
After setting up your records, verify them using these commands:
# Test A records
dig mongo1.yourdomain.com
dig mongo2.yourdomain.com
dig mongo3.yourdomain.com
# Test SRV records
dig srv _mongodb._tcp.db.yourdomain.com
The SRV lookup should return all three nodes with their priorities and weights.
Bringing It All Together Link to heading
After setting up each server, I initialized the replica set:
rs.initiate({
_id: "rs0",
members: [
{ _id: 0, host: "mongo1.yourdomain.com:27017", priority: 1 },
{ _id: 1, host: "mongo2.yourdomain.com:27017", priority: 0.5 },
{ _id: 2, host: "mongo3.yourdomain.com:27017", priority: 0.5 }
]
})
The Connection String That Makes It Work Link to heading
Here’s how our applications connect to this setup:
mongodb+srv://root:your_password@db.yourdomain.com/your_database?replicaSet=rs0&readPreference=secondaryPreferred&ssl=false&authSource=admin
Real-World Performance Notes Link to heading
After running this setup for a while, I’ve noticed:
- Write operations consistently complete in under 50ms
- Read operations from secondaries average around 20ms
- Failovers, when they happen, complete in under 10 seconds
- Our application hasn’t experienced any downtime due to database issues
Monitoring and Maintenance Link to heading
I regularly check the replica set’s health using:
rs.status()
This command gives me everything I need to know about:
- Each member’s state
- Replication lag
- Election status
- Synchronization health
Host System Optimization Link to heading
Before deploying our MongoDB cluster, it’s crucial to optimize each host server for high-performance database operations. I’ve created a comprehensive optimization script that configures various system parameters for optimal MongoDB performance.
Script Overview Link to heading
https://gist.github.com/polymatx/cc067da36aa9293d42839f4fe9f09673
What’s Next? Link to heading
I’m currently exploring:
- Automated backup strategies
- Monitoring solutions
- Performance optimization techniques
- Implement a sharding strategy
I’ll share my findings in future posts as I implement and test these improvements.
Conclusion Link to heading
Setting up a MongoDB cluster isn’t a one-size-fits-all process. What I’ve shared here is what worked for our specific needs - a balance of reliability, performance, and manageability. Your mileage may vary, but these principles should give you a solid foundation to build on.
Remember: the best database setup is the one that lets you sleep at night without worrying about data loss or downtime. This setup has given me that peace of mind, and I hope it helps you achieve the same.
Feel free to reach out if you have questions about any part of this setup. I’m always happy to help fellow developers avoid the pitfalls I encountered along the way.