Files
docudjeex/content/5.nonsense/1.bash.md
2025-07-11 16:35:14 +00:00

227 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
navigation: true
title: Bash Scripts
main:
fluid: false
---
:ellipsis{left=0px width=40rem top=10rem blur=140px}
# Bash Scripts
A few random scripts that saved my life.
## Detecting Duplicates and Replacing Them with Hardlinks
---
Six months after downloading terabytes of media, I realized that Sonarr and Radarr were copying them into my Plex library instead of creating hardlinks. This happens due to a counterintuitive mechanism: if you mount multiple folders in Sonarr/Radarr, it sees them as different filesystems and thus cannot create hardlinks. Thats why you should mount only one parent folder containing all child folders (like `downloads`, `movies`, `tvseries` inside a `media` parent folder).
So I restructured my directories, manually updated every path in Qbittorrent, Plex, and others. The last challenge was finding a way to detect existing duplicates, delete them, and automatically create hardlinks instead—to save space.
My directory structure:
```console
.
└── media
├── seedbox
├── radarr
│ └── tv-radarr
├── movies
└── tvseries
```
The originals are in `seedbox` and must not be modified to keep seeding. The copies (duplicates) are in `movies` and `tvseries`. To complicate things, there are also unique originals in `movies` and `tvseries`. And within those, there can be subfolders, sub-subfolders, etc.
So the idea is to:
- list the originals in seedbox
- list files in movies and tvseries
- compare both lists and isolate duplicates
- delete the duplicates
- hardlink the originals to the deleted duplicate paths
Yes, I asked ChatGPT and Qwen3 (which I host on a dedicated AI machine). Naturally, they suggested tools like rfind, rdfind, dupes, rdupes, rmlint... But hashing 30TB of media would take days, so I gave up quickly.
In the end, I only needed to find `.mkv` files, and duplicates have the exact same name as the originals, which simplifies things a lot. A simple Bash script would do the job.
Spare you the endless Q&A with ChatGPT—I was disappointed. Qwen3 was much cleaner. ChatGPT kept pushing awk-based solutions, which fail on paths with spaces. With Qwens help and dropping awk, the results improved significantly.
To test, I first asked for a script that only lists and compares:
```bash
#!/bin/bash
# Create an associative array to store duplicates
declare -A seen
# Find all .mkv files only (exclude directories)
find /media/seedbox /media/movies /media/tvseries -type f -name "*.mkv" -print0 | \
while IFS= read -r -d '' file; do
# Get the file's inode and name
inode=$(stat --format="%i" "$file")
filename=$(basename "$file")
# If the filename has been seen before
if [[ -n "${seen[$filename]}" ]]; then
# Check if the inode is different from the previous one
if [[ "${seen[$filename]}" != "$inode" ]]; then
# Output the duplicates with full paths
echo "Duplicates for \"$filename\":"
echo "${seen["$filename"]} ${seen["$filename:full_path"]}"
echo "$inode $file"
echo
fi
else
seen[$filename]="$inode"
seen["$filename:full_path"]="$file"
fi
done
```
This gave me outputs like:
```
Duplicates for "episode1.mkv":
1234567 /media/seedbox/sonarr/Serie 1/Season1/episode1.mkv
2345678 /media/tvseries/Serie 1/Season1/episode1.mkv
```
With `awk`, it wouldve stopped at `/media/seedbox/sonarr/Serie`. Im far from an expert, but Qwen3 performed better and explained everything clearly.
Once I verified the output, I asked for a complete script: compare, delete duplicates, create hardlinks.
Again, ChatGPT disappointed. Despite my requests, it created hardlinks *before* deleting the duplicates—effectively linking and then deleting the link (though the original is kept). Not helpful.
Quick stopover to Qwen3, RTX 5090 in overdrive, and bam—much better result. Yes, it kept ChatGPT-style emojis, but here it is:
```bash
#!/bin/bash
echo "🔍 Step 1: Indexing original files in /media/seedbox..."
declare -A seen
# Index all .mkv files in seedbox
while IFS= read -r -d '' file; do
filename=$(basename "$file")
seen["$filename"]="$file"
done < <(find /media/seedbox -type f -name "*.mkv" -print0)
echo "📦 Step 2: Automatically replacing duplicates..."
total_doublons=0
total_ko_saved=0
while IFS= read -r -d '' file; do
filename=$(basename "$file")
original="${seen[$filename]}"
if [[ -n "$original" && "$original" != "$file" ]]; then
inode_orig=$(stat -c %i "$original")
inode_dupe=$(stat -c %i "$file")
if [[ "$inode_orig" != "$inode_dupe" ]]; then
size_kb=$(du -k "$file" | cut -f1)
echo "🔁 Replacing:"
echo " Duplicate : $file"
echo " Original : $original"
echo " Size : ${size_kb} KB"
rm "$file" && ln "$original" "$file" && echo "✅ Hardlink created."
total_doublons=$((total_doublons + 1))
total_ko_saved=$((total_ko_saved + size_kb))
fi
fi
done < <(find /media/movies /media/tvseries -type f -name "*.mkv" -print0)
echo ""
echo "🧾 Summary:"
echo " 🔗 Duplicates replaced by hardlink: $total_doublons"
echo " 💾 Approx. disk space saved: ${total_ko_saved} KB (~$((total_ko_saved / 1024)) MB)"
echo "✅ Done."
```
So, in conclusion, I:
- Learned many Bash subtleties
- Learned never to blindly copy-paste a ChatGPT script without understanding and dry-running it
- Learned that Qwen on a RTX 5090 is more coherent than ChatGPT-4o on server farms (not even mentioning “normal” ChatGPT)
- Learned that even with 100TB of storage, monitoring it wouldve alerted me much earlier to the 12TB of duplicates lying around
## Backup of LUKS Headers for Encrypted Disks/Volumes
---
I recently realized that having just the password is not enough to unlock a LUKS volume after a failure or corruption. I learned how to dump the LUKS headers from disks/volumes and to use the serial numbers along with partition names to accurately identify which header corresponds to which disk/partition (I have 10 of them!).
After struggling to do this manually, I asked Qwen3 (an LLM running on my RTX 5090) to create a script that automates the listing and identification of disks, dumps the headers, and stores them in an encrypted archive ready to be backed up on my backup server.
This script:
* Lists and identifies disks with their serial numbers
* Lists partitions
* Dumps headers into a secured folder under `/root`
* Creates a temporary archive
* Prompts for a password
* Encrypts the archive with that password
* Deletes the unencrypted archive
```bash
#!/bin/bash
# Directory where LUKS headers will be backed up
DEST="/root/luks-headers-backup"
mkdir -p "$DEST"
echo "🔍 Searching for LUKS containers on all partitions..."
# Loop through all possible disk partitions (including NVMe and SATA)
for part in /dev/sd? /dev/sd?? /dev/nvme?n?p?; do
# Skip if the device doesn't exist
if [ ! -b "$part" ]; then
continue
fi
# Check if the partition is a LUKS encrypted volume
if cryptsetup isLuks "$part"; then
# Find the parent disk device (e.g. nvme0n1p4 → nvme0n1)
disk=$(lsblk -no pkname "$part" | head -n 1)
full_disk="/dev/$disk"
# Get the serial number of the parent disk
SERIAL=$(udevadm info --query=all --name="$full_disk" | grep ID_SERIAL= | cut -d= -f2)
if [ -z "$SERIAL" ]; then
SERIAL="unknown"
fi
# Extract the partition name (e.g. nvme0n1p4)
PART_NAME=$(basename "$part")
# Build the output filename with partition name and disk serial
OUTPUT="$DEST/luks-header-${PART_NAME}__${SERIAL}.img"
echo "🔐 Backing up LUKS header of $part (Serial: $SERIAL)..."
# Backup the LUKS header to the output file
cryptsetup luksHeaderBackup "$part" --header-backup-file "$OUTPUT"
if [[ $? -eq 0 ]]; then
echo "✅ Backup successful → $OUTPUT"
else
echo "❌ Backup failed for $part"
fi
fi
done
# Create a timestamped compressed tar archive of all header backups
ARCHIVE_NAME="/root/luks-headers-$(date +%Y%m%d_%H%M%S).tar.gz"
echo "📦 Creating archive $ARCHIVE_NAME..."
tar -czf "$ARCHIVE_NAME" -C "$DEST" .
# Encrypt the archive symmetrically using GPG with AES256 cipher
echo "🔐 Encrypting the archive with GPG..."
gpg --symmetric --cipher-algo AES256 "$ARCHIVE_NAME"
if [[ $? -eq 0 ]]; then
echo "✅ Encrypted archive created: ${ARCHIVE_NAME}.gpg"
# Remove the unencrypted archive for security
rm -f "$ARCHIVE_NAME"
else
echo "❌ Encryption failed"
fi
```
**Dont forget to back up `/etc/fstab` and `/etc/crypttab` as well!**