top — Built-in Process Monitor
top is the quickest way to see what is consuming CPU and RAM right now. It's available on every Linux system with no install required — your first tool in any triage session.
Launch top
top
Essential top key bindings (while running)
| Key | Action |
|---|---|
q | Quit |
M | Sort by memory usage |
P | Sort by CPU usage (default) |
k | Kill a process by PID |
1 | Show individual CPU cores |
u | Filter by username |
d | Change refresh interval |
Space | Force immediate refresh |
Run top non-interactively (for scripting)
# Single snapshot — 1 iteration, output to stdout top -b -n 1 # Show only the top 20 processes top -b -n 1 | head -30
The first five lines show uptime and load averages, total tasks, CPU breakdown (us=user, sy=kernel, id=idle), and memory/swap usage. The load average numbers (1m / 5m / 15m) are your first sign of sustained pressure — values above your CPU core count mean the system is queued.
htop — The Better top
htop is an interactive, color-coded process viewer that shows per-core CPU bars, memory meters, and process trees. It's the tool most sysadmins reach for first when investigating a live system.
Install htop
sudo apt install htop -y
Launch htop
htop
# Run as a specific user to filter immediately
htop -u www-data
htop key bindings
| Key | Action |
|---|---|
F2 | Setup / configuration |
F3 | Search processes by name |
F4 | Filter (show only matching) |
F5 | Tree view — shows parent/child relationships |
F6 | Sort by column |
F9 | Kill selected process |
F10 / q | Quit |
Space | Tag a process |
u | Filter by user |
What to look for
- CPU bars pegged at 100% on one or more cores — identify the PID and process name
- Memory bar nearly full with swap growing — potential OOM situation
- Load average climbing well above core count — system is being overwhelmed
- Zombie processes (Z state) — parent not reaping children, may indicate a bug
Pressing F5 in htop shows processes in a parent-child tree. This immediately reveals which web server worker, PHP-FPM pool, or spawned script is consuming resources — rather than a flat list that requires cross-referencing PIDs.
iotop — Disk I/O by Process
When your server is slow but CPU and RAM look fine, the culprit is often disk I/O. iotop shows you exactly which process is hammering storage — essential for diagnosing slow MySQL queries, runaway log writers, or backup jobs competing with live traffic.
Install iotop
sudo apt install iotop -y
Run iotop
# Interactive mode — requires root sudo iotop # Show only processes with active I/O (quieter output) sudo iotop -o # Batch mode — useful for logging sudo iotop -b -n 3
Column reference
| Column | Meaning |
|---|---|
DISK READ | Current read bandwidth for this process |
DISK WRITE | Current write bandwidth for this process |
SWAPIN | % time waiting on swap reads — high values = memory pressure |
IO> | % time this process spent waiting on I/O |
On some minimal Ubuntu installs iotop may report "CONFIG_TASK_IO_ACCOUNTING not set in kernel". This is rare on standard Ubuntu 24.04 LTS but can happen on custom kernels. If so, use iostat (from sysstat) as an alternative for per-device I/O stats.
Alternative: iostat for device-level I/O
sudo apt install sysstat -y # Show device I/O stats every 2 seconds iostat -x 2 # Focus on a specific device iostat -x 2 /dev/sda
nload — Network Bandwidth Monitor
nload shows real-time incoming and outgoing bandwidth per network interface with an ASCII graph. It's the fastest way to see if your server is saturating its network link or if traffic looks abnormal.
Install nload
sudo apt install nload -y
Run nload
# Monitor all interfaces (arrow keys to switch) nload # Monitor a specific interface nload eth0 nload enp3s0
nload navigation
| Key | Action |
|---|---|
| ← → | Switch between network interfaces |
F2 | Options screen |
q | Quit |
Alternative: nethogs — bandwidth by process
nload shows interface-level bandwidth. If you need to know which process is responsible for that traffic, use nethogs:
sudo apt install nethogs -y sudo nethogs eth0
Quick interface stats without extra tools
# Snapshot of bytes transferred on all interfaces cat /proc/net/dev # ip command interface stats ip -s link show eth0
journalctl — Reading System Logs
On modern Ubuntu systems, journalctl is the primary interface to the systemd journal — a structured log that captures output from every service, the kernel, and the boot process. Learning to query it efficiently is one of the most valuable sysadmin skills.
Most-used journalctl commands
# Show all logs (oldest first) — pipe to less journalctl | less # Show logs for a specific service journalctl -u nginx journalctl -u apache2 journalctl -u mysql journalctl -u php8.3-fpm # Follow logs in real time (like tail -f) journalctl -u nginx -f # Show logs since the last boot only journalctl -b # Show logs from a previous boot (-1 = last, -2 = two boots ago) journalctl -b -1 # Filter by time range journalctl --since "2026-03-17 14:00:00" journalctl --since "1 hour ago" journalctl --since "30 min ago" --until "now" # Show only errors and above (emerg, alert, crit, err) journalctl -p err # Show kernel messages only journalctl -k # Show the last 50 lines journalctl -n 50 # Show logs in reverse (newest first) journalctl -r
Combining filters
# Errors from nginx in the last 2 hours journalctl -u nginx -p err --since "2 hours ago" # All errors since last boot, newest first journalctl -b -p err -r
Disk usage and maintenance
# Check how much disk the journal is using journalctl --disk-usage # Keep only the last 2 weeks of logs sudo journalctl --vacuum-time=2weeks # Keep journal under 500 MB sudo journalctl --vacuum-size=500M
By default on Ubuntu 24.04, the journal may be stored in /run/log/journal (volatile, lost on reboot). To make it persistent across reboots: sudo mkdir -p /var/log/journal && sudo systemctl restart systemd-journald. Persistent logs are stored in /var/log/journal.
/var/log — Log File Structure
Not everything logs to the systemd journal. Many services write directly to files in /var/log. Knowing which log belongs to which service is fundamental to fast troubleshooting.
Key log locations
| Log File / Directory | What It Contains |
|---|---|
/var/log/syslog | General system messages (kernel, daemons, services) |
/var/log/auth.log | Authentication: SSH logins, sudo, PAM, failed login attempts |
/var/log/kern.log | Kernel messages only |
/var/log/dmesg | Boot-time kernel ring buffer (hardware detection, driver messages) |
/var/log/dpkg.log | Package installs, upgrades, and removals |
/var/log/apt/history.log | High-level apt command history |
/var/log/apache2/access.log | Every HTTP request served by Apache |
/var/log/apache2/error.log | Apache errors, PHP errors via mod_php |
/var/log/nginx/access.log | Every HTTP request served by Nginx |
/var/log/nginx/error.log | Nginx errors and upstream failures |
/var/log/mysql/error.log | MySQL/MariaDB startup, shutdown, and errors |
/var/log/fail2ban.log | Fail2Ban bans, unbans, and jail activity |
/var/log/ufw.log | UFW firewall rule matches (if logging enabled) |
/var/log/php*.log | PHP error log (path set in php.ini) |
Useful commands for working with log files
# Follow a log file in real time sudo tail -f /var/log/syslog sudo tail -f /var/log/apache2/error.log # Show last 100 lines sudo tail -n 100 /var/log/auth.log # Search for a pattern in a log sudo grep "Failed password" /var/log/auth.log sudo grep "error" /var/log/nginx/error.log # Count occurrences of a pattern sudo grep -c "Failed password" /var/log/auth.log # Search across all logs for a string sudo grep -r "out of memory" /var/log/ # Check when a log was last modified ls -lh /var/log/syslog
Log rotation
Ubuntu uses logrotate to compress and rotate log files automatically, preventing /var/log from filling your disk. Rotated logs have extensions like .1, .2.gz.
# Check logrotate configuration cat /etc/logrotate.conf ls /etc/logrotate.d/ # Manually trigger rotation (useful for testing) sudo logrotate --force /etc/logrotate.conf # Check disk usage of /var/log sudo du -sh /var/log/* | sort -h | tail -20
dmesg — Kernel Messages
dmesg prints the kernel ring buffer — hardware detection at boot, driver messages, disk errors, OOM (out-of-memory) kills, USB events, and network interface state changes. It's essential for diagnosing hardware problems and kernel-level errors that never appear in service logs.
Basic dmesg usage
# Print full kernel ring buffer dmesg # Human-readable timestamps (requires root on some systems) sudo dmesg -T # Follow new messages in real time sudo dmesg -w # Show only errors and above sudo dmesg --level=err,crit,alert,emerg # Show only warnings and above sudo dmesg --level=warn,err,crit,alert,emerg
Filtering dmesg output
# Look for disk/storage errors sudo dmesg | grep -i "error\|fail\|fault" | tail -30 # Look for OOM killer activity sudo dmesg | grep -i "oom\|out of memory\|killed" # Look for network interface events sudo dmesg | grep -i "eth\|enp\|link" # Look for USB device events sudo dmesg | grep -i "usb" # Look for hardware errors sudo dmesg | grep -iE "mce|hardware error|corrected"
Common dmesg findings and what they mean
| Message Pattern | Likely Cause |
|---|---|
oom-killer: ...killed process | System ran out of RAM — a process was killed to free memory |
EXT4-fs error | Filesystem corruption — run fsck on the affected device |
ata... failed command | Disk I/O error — check SMART status with smartctl |
NVRM: GPU... error | NVIDIA driver error — check GPU temperature and driver version |
eth0: link down / link up | Network cable event or switch port issue |
segfault at... | Application crash with memory fault — usually a software bug |
MCE: ... HARDWARE ERROR | Machine Check Exception — potential RAM or CPU hardware fault |
systemd-analyze — Boot Time Profiling
systemd-analyze profiles your system boot — showing total boot time, which services are slowest, and generating visual timelines. It's the right tool when a server is taking longer than expected to come online after a reboot.
Total boot time
# Show total time broken down: firmware + loader + kernel + userspace systemd-analyze # Example output: Startup finished in 1.832s (kernel) + 8.471s (userspace) = 10.303s graphical.target reached after 8.412s in userspace
Find the slowest services
# List services by activation time, slowest first systemd-analyze blame # Show only the top 10 slowest systemd-analyze blame | head -10 # Example output: 8.431s mysql.service 4.012s networking.service 2.204s cloud-init.service 1.891s apt-daily-upgrade.service 0.844s fail2ban.service
Critical path — what's actually blocking boot
# Show the critical chain — services on the longest dependency path systemd-analyze critical-chain # Critical chain for a specific target systemd-analyze critical-chain multi-user.target
Visual SVG timeline (for desktop/local servers)
# Generate an SVG boot timeline (open in a browser) systemd-analyze plot > /tmp/boot-timeline.svg # Then open it on your local machine: # scp user@server:/tmp/boot-timeline.svg ~/Desktop/
Verify unit configuration
# Check a unit file for errors systemd-analyze verify /etc/systemd/system/myapp.service # Check the security exposure level of a service systemd-analyze security nginx systemd-analyze security mysql
The security subcommand scores each service unit against systemd's sandboxing capabilities. A high exposure score means the service has broad access to the system. This is a useful hardening audit tool — look for services running as root with PrivateTmp=no or NoNewPrivileges=no.
Basic Performance Triage
When something is slow or broken, guessing wastes time. A structured triage process gets you to the cause faster. Work through these layers in order — most problems reveal themselves within the first three steps.
Step 1 — Is the server actually under load?
# Load average and uptime uptime 14:22:01 up 12 days, 3:41, 2 users, load average: 0.42, 0.58, 0.62 # Load averages above your CPU core count = queued processes # Check core count: nproc 4 # If load average >> nproc, you have sustained pressure # Launch htop to find the culprit: htop
Step 2 — Memory pressure
# Quick memory overview free -h total used free shared buff/cache available Mem: 7.7G 5.1G 312M 182M 2.3G 2.1G Swap: 2.0G 820M 1.2G # If "available" is very low and swap is growing, you have memory pressure # Check for OOM events: sudo dmesg | grep -i "oom\|killed" sudo journalctl -p err -b | grep -i "oom\|memory"
Step 3 — Is it a disk I/O problem?
# Check disk usage — full disks cause silent failures df -h # Check inode usage (can fill even when disk space is available) df -i # Find what's consuming disk in /var/log sudo du -sh /var/log/* | sort -h | tail -10 # Check active I/O sudo iotop -o
Step 4 — Check service logs for errors
# All errors since last boot journalctl -b -p err -r | head -40 # Check the specific service that seems broken systemctl status nginx journalctl -u nginx -n 50 # Check authentication failures (brute force / intrusion) sudo grep "Failed password\|Invalid user" /var/log/auth.log | tail -20
Step 5 — Network saturation
# Live bandwidth by interface nload eth0 # Check open connections ss -s # Show established connections count by remote IP (detect floods) ss -nt | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -10 # Check for unusually high connection counts to a port ss -nt state established | wc -l
Quick reference: triage flow
# 1. Load average uptime # 2. CPU / process htop # 3. Memory free -h # 4. Disk space df -h && df -i # 5. Disk I/O sudo iotop -o # 6. Recent errors journalctl -b -p err -r | head -30 # 7. Service status systemctl status <servicename> # 8. Kernel messages sudo dmesg -T --level=err,crit | tail -20 # 9. Network nload && ss -s
The difference between a junior admin and a senior one isn't knowing more commands — it's working through a structured process. Start wide (is the whole server struggling?), then narrow (which service, which resource, which log entry). Each step eliminates a category before you move to the next.