52 lines
3.4 KiB
Markdown
52 lines
3.4 KiB
Markdown
---
|
||
navigation: true
|
||
title: HotDisk
|
||
main:
|
||
fluid: false
|
||
---
|
||
:ellipsis{left=0px width=40rem top=10rem blur=140px}
|
||
|
||
# HotDisk
|
||
---
|
||
|
||
When you have a NAS with several drives sitting in a laundry room, temperatures can quickly rise.
|
||
Hard drives are very sensitive to heat and can suffer serious damage if they exceed a certain temperature threshold for too long.
|
||
After a particularly hot summer that caused a few cold sweats while monitoring my drives’ temperatures, I started looking for a way to automatically shut down the server when disk temperatures stay above their safe limit for an extended period.
|
||
|
||
Since I couldn’t find a convincing solution, I decided to build my own.
|
||
|
||
- The script reads SMART temperature data from all SATA drives every minute.
|
||
- It counts the number of consecutive minutes the temperature stays above or below the threshold.
|
||
- It sends Discord notifications if the threshold is exceeded or when the temperature cools down.
|
||
- It triggers a system shutdown if the temperature stays above the limit for the configured duration.
|
||
- It logs all temperatures and counter states, and automatically rotates log files.
|
||
|
||
While I was at it, I also added an installation script that installs the main script, makes it executable, creates a systemd service and timer, and enables them automatically.
|
||
The installer also lets you configure various parameters:
|
||
|
||
| Variable | Description | Default Value |
|
||
|-----------------------|------------------------------------------------------------------------------|-----------------------------------------------|
|
||
| `MAX_TEMP` | Maximum allowed temperature (°C) before the shutdown countdown starts | `60` |
|
||
| `HOT_DURATION` | Consecutive minutes above `MAX_TEMP` before shutdown | `5` |
|
||
| `COOL_RESET_DURATION` | Consecutive minutes below `MAX_TEMP` to reset all counters | `5` |
|
||
| `LOG_FILE` | Path to the main log file | `/var/log/hdd_temp_monitor.log` |
|
||
| `LOG_ROTATE_COUNT` | Number of log files to keep | `7` |
|
||
| `LOG_ROTATE_PERIOD` | Log rotation period (`daily` or `weekly`) | `daily` |
|
||
| `DISCORD_WEBHOOK` | Discord webhook URL for notifications | _Required_ |
|
||
|
||
It also runs another script that configures **logrotate** with the parameters defined above.
|
||
Finally, the installer can even be executed directly via a simple `curl` command followed by one last setup script — perfect for the laziest of us.
|
||
|
||
I also had to handle several tricky cases: running as root without sudo, using sudo directly, running as a non-sudo user, missing dependencies, permission issues, file creation errors, disk data reading errors, and more.
|
||
|
||
Concurrent access to the status file also had to be managed carefully.
|
||
|
||
More details are available directly on the repository:
|
||
|
||
::card
|
||
#title
|
||
📜 __HotDisk__
|
||
#description
|
||
[Keep your drives cool!](https://git.djeex.fr/Djeex/hotdisk)
|
||
::
|