New directory and icons

2025-07-20 17:49:48 +00:00
parent c89b16d0ae
commit 4af8bbe1e4
25 changed files with 163 additions and 125 deletions
--- a/content/5.nonsense/2.bash/1.servarr-duplicates.md
+++ b/content/5.nonsense/2.bash/1.servarr-duplicates.md
@@ -0,0 +1,141 @@
+---
+navigation: true
+title: Bash Scripts
+main:
+  fluid: false
+---
+:ellipsis{left=0px width=40rem top=10rem blur=140px}
+# Servarr duplicates corrector
+---
+
+Six months after downloading terabytes of media, I realized that Sonarr and Radarr were copying them into my Plex library instead of creating hardlinks. This happens due to a counterintuitive mechanism: if you mount multiple folders in Sonarr/Radarr, it sees them as different filesystems and thus cannot create hardlinks. That’s why you should mount only one parent folder containing all child folders (like `downloads`, `movies`, `tvseries` inside a `media` parent folder).
+
+So I restructured my directories, manually updated every path in Qbittorrent, Plex, and others. The last challenge was finding a way to detect existing duplicates, delete them, and automatically create hardlinks instead—to save space.
+
+My directory structure:
+
+```console
+.
+└── media
+    ├── seedbox
+    ├── radarr
+    │   └── tv-radarr
+    ├── movies
+    └── tvseries
+```
+
+The originals are in `seedbox` and must not be modified to keep seeding. The copies (duplicates) are in `movies` and `tvseries`. To complicate things, there are also unique originals in `movies` and `tvseries`. And within those, there can be subfolders, sub-subfolders, etc.
+
+So the idea is to:
+
+- list the originals in seedbox
+- list files in movies and tvseries
+- compare both lists and isolate duplicates
+- delete the duplicates
+- hardlink the originals to the deleted duplicate paths
+
+Yes, I asked ChatGPT and Qwen3 (which I host on a dedicated AI machine). Naturally, they suggested tools like rfind, rdfind, dupes, rdupes, rmlint... But hashing 30TB of media would take days, so I gave up quickly.
+
+In the end, I only needed to find `.mkv` files, and duplicates have the exact same name as the originals, which simplifies things a lot. A simple Bash script would do the job.
+
+Spare you the endless Q&A with ChatGPT—I was disappointed. Qwen3 was much cleaner. ChatGPT kept pushing awk-based solutions, which fail on paths with spaces. With Qwen’s help and dropping awk, the results improved significantly.
+
+To test, I first asked for a script that only lists and compares:
+
+```bash
+#!/bin/bash
+
+# Create an associative array to store duplicates
+declare -A seen
+
+# Find all .mkv files only (exclude directories)
+find /media/seedbox /media/movies /media/tvseries -type f -name "*.mkv" -print0 | \
+while IFS= read -r -d '' file; do
+    # Get the file's inode and name
+    inode=$(stat --format="%i" "$file")
+    filename=$(basename "$file")
+    
+    # If the filename has been seen before
+    if [[ -n "${seen[$filename]}" ]]; then
+        # Check if the inode is different from the previous one
+        if [[ "${seen[$filename]}" != "$inode" ]]; then
+            # Output the duplicates with full paths
+            echo "Duplicates for \"$filename\":"
+            echo "${seen["$filename"]} ${seen["$filename:full_path"]}"
+            echo "$inode $file"
+            echo
+        fi
+    else
+        seen[$filename]="$inode"
+        seen["$filename:full_path"]="$file"
+    fi
+done
+```
+
+This gave me outputs like:
+
+```
+Duplicates for "episode1.mkv":
+1234567 /media/seedbox/sonarr/Serie 1/Season1/episode1.mkv
+2345678 /media/tvseries/Serie 1/Season1/episode1.mkv
+```
+
+With `awk`, it would’ve stopped at `/media/seedbox/sonarr/Serie`. I’m far from an expert, but Qwen3 performed better and explained everything clearly.
+
+Once I verified the output, I asked for a complete script: compare, delete duplicates, create hardlinks.
+
+Again, ChatGPT disappointed. Despite my requests, it created hardlinks *before* deleting the duplicates—effectively linking and then deleting the link (though the original is kept). Not helpful.
+
+Quick stopover to Qwen3, RTX 5090 in overdrive, and bam—much better result. Yes, it kept ChatGPT-style emojis, but here it is:
+
+```bash
+#!/bin/bash
+
+echo "🔍 Step 1: Indexing original files in /media/seedbox..."
+declare -A seen
+
+# Index all .mkv files in seedbox
+while IFS= read -r -d '' file; do
+    filename=$(basename "$file")
+    seen["$filename"]="$file"
+done < <(find /media/seedbox -type f -name "*.mkv" -print0)
+
+echo "📦 Step 2: Automatically replacing duplicates..."
+total_doublons=0
+total_ko_saved=0
+
+while IFS= read -r -d '' file; do
+    filename=$(basename "$file")
+    original="${seen[$filename]}"
+
+    if [[ -n "$original" && "$original" != "$file" ]]; then
+        inode_orig=$(stat -c %i "$original")
+        inode_dupe=$(stat -c %i "$file")
+
+        if [[ "$inode_orig" != "$inode_dupe" ]]; then
+            size_kb=$(du -k "$file" | cut -f1)
+            echo "🔁 Replacing:"
+            echo "    Duplicate : $file"
+            echo "    Original  : $original"
+            echo "    Size      : ${size_kb} KB"
+
+            rm "$file" && ln "$original" "$file" && echo "✅ Hardlink created."
+
+            total_doublons=$((total_doublons + 1))
+            total_ko_saved=$((total_ko_saved + size_kb))
+        fi
+    fi
+done < <(find /media/movies /media/tvseries -type f -name "*.mkv" -print0)
+
+echo ""
+echo "🧾 Summary:"
+echo "    🔗 Duplicates replaced by hardlink: $total_doublons"
+echo "    💾 Approx. disk space saved: ${total_ko_saved} KB (~$((total_ko_saved / 1024)) MB)"
+echo "✅ Done."
+```
+
+So, in conclusion, I:
+- Learned many Bash subtleties
+- Learned never to blindly copy-paste a ChatGPT script without understanding and dry-running it
+- Learned that Qwen on a RTX 5090 is more coherent than ChatGPT-4o on server farms (not even mentioning “normal” ChatGPT)
+- Learned that even with 100TB of storage, monitoring it would’ve alerted me much earlier to the 12TB of duplicates lying around
--- a/content/5.nonsense/2.bash/2.luks-
+++ b/content/5.nonsense/2.bash/2.luks-
@@ -0,0 +1,88 @@
+---
+navigation: true
+title: LUKS Backup
+main:
+  fluid: false
+---
+:ellipsis{left=0px width=40rem top=10rem blur=140px}
+
+# Backup of LUKS Headers for Encrypted Disks/Volumes
+---
+
+I recently realized that having just the password is not enough to unlock a LUKS volume after a failure or corruption. I learned how to dump the LUKS headers from disks/volumes and to use the serial numbers along with partition names to accurately identify which header corresponds to which disk/partition (I have 10 of them!).
+
+After struggling to do this manually, I asked Qwen3 (an LLM running on my RTX 5090) to create a script that automates the listing and identification of disks, dumps the headers, and stores them in an encrypted archive ready to be backed up on my backup server.
+
+This script:
+* Lists and identifies disks with their serial numbers
+* Lists partitions
+* Dumps headers into a secured folder under `/root`
+* Creates a temporary archive
+* Prompts for a password
+* Encrypts the archive with that password
+* Deletes the unencrypted archive
+
+```bash
+#!/bin/bash
+
+# Directory where LUKS headers will be backed up
+DEST="/root/luks-headers-backup"
+mkdir -p "$DEST"
+
+echo "🔍 Searching for LUKS containers on all partitions..."
+
+# Loop through all possible disk partitions (including NVMe and SATA)
+for part in /dev/sd? /dev/sd?? /dev/nvme?n?p?; do
+    # Skip if the device doesn't exist
+    if [ ! -b "$part" ]; then
+        continue
+    fi
+
+    # Check if the partition is a LUKS encrypted volume
+    if cryptsetup isLuks "$part"; then
+        # Find the parent disk device (e.g. nvme0n1p4 → nvme0n1)
+        disk=$(lsblk -no pkname "$part" | head -n 1)
+        full_disk="/dev/$disk"
+
+        # Get the serial number of the parent disk
+        SERIAL=$(udevadm info --query=all --name="$full_disk" | grep ID_SERIAL= | cut -d= -f2)
+        if [ -z "$SERIAL" ]; then
+            SERIAL="unknown"
+        fi
+
+        # Extract the partition name (e.g. nvme0n1p4)
+        PART_NAME=$(basename "$part")
+
+        # Build the output filename with partition name and disk serial
+        OUTPUT="$DEST/luks-header-${PART_NAME}__${SERIAL}.img"
+
+        echo "🔐 Backing up LUKS header of $part (Serial: $SERIAL)..."
+
+        # Backup the LUKS header to the output file
+        cryptsetup luksHeaderBackup "$part" --header-backup-file "$OUTPUT"
+        if [[ $? -eq 0 ]]; then
+            echo "✅ Backup successful → $OUTPUT"
+        else
+            echo "❌ Backup failed for $part"
+        fi
+    fi
+done
+
+# Create a timestamped compressed tar archive of all header backups
+ARCHIVE_NAME="/root/luks-headers-$(date +%Y%m%d_%H%M%S).tar.gz"
+echo "📦 Creating archive $ARCHIVE_NAME..."
+tar -czf "$ARCHIVE_NAME" -C "$DEST" .
+
+# Encrypt the archive symmetrically using GPG with AES256 cipher
+echo "🔐 Encrypting the archive with GPG..."
+gpg --symmetric --cipher-algo AES256 "$ARCHIVE_NAME"
+if [[ $? -eq 0 ]]; then
+    echo "✅ Encrypted archive created: ${ARCHIVE_NAME}.gpg"
+    # Remove the unencrypted archive for security
+    rm -f "$ARCHIVE_NAME"
+else
+    echo "❌ Encryption failed"
+fi
+```
+
+**Don’t forget to back up `/etc/fstab` and `/etc/crypttab` as well!**
--- a/content/5.nonsense/2.bash/_dir.yml
+++ b/content/5.nonsense/2.bash/_dir.yml
@@ -0,0 +1,2 @@
+navigation.title: Bash
+icon: lucide:file-terminal