Djeex/docudjeex

Fork 0

Files

Djeex 1d424bd197 Script backup luks header

2025-07-11 16:35:14 +00:00

8.6 KiB

Raw Blame History

navigation, title, main

navigation

title

main

true

Bash Scripts

fluid
false

:ellipsis{left=0px width=40rem top=10rem blur=140px}

Bash Scripts

A few random scripts that saved my life.

Detecting Duplicates and Replacing Them with Hardlinks

Six months after downloading terabytes of media, I realized that Sonarr and Radarr were copying them into my Plex library instead of creating hardlinks. This happens due to a counterintuitive mechanism: if you mount multiple folders in Sonarr/Radarr, it sees them as different filesystems and thus cannot create hardlinks. That’s why you should mount only one parent folder containing all child folders (like downloads, movies, tvseries inside a media parent folder).

So I restructured my directories, manually updated every path in Qbittorrent, Plex, and others. The last challenge was finding a way to detect existing duplicates, delete them, and automatically create hardlinks instead—to save space.

My directory structure:

.
└── media
    ├── seedbox
    ├── radarr
    │   └── tv-radarr
    ├── movies
    └── tvseries

The originals are in seedbox and must not be modified to keep seeding. The copies (duplicates) are in movies and tvseries. To complicate things, there are also unique originals in movies and tvseries. And within those, there can be subfolders, sub-subfolders, etc.

So the idea is to:

list the originals in seedbox
list files in movies and tvseries
compare both lists and isolate duplicates
delete the duplicates
hardlink the originals to the deleted duplicate paths

Yes, I asked ChatGPT and Qwen3 (which I host on a dedicated AI machine). Naturally, they suggested tools like rfind, rdfind, dupes, rdupes, rmlint... But hashing 30TB of media would take days, so I gave up quickly.

In the end, I only needed to find .mkv files, and duplicates have the exact same name as the originals, which simplifies things a lot. A simple Bash script would do the job.

Spare you the endless Q&A with ChatGPT—I was disappointed. Qwen3 was much cleaner. ChatGPT kept pushing awk-based solutions, which fail on paths with spaces. With Qwen’s help and dropping awk, the results improved significantly.

To test, I first asked for a script that only lists and compares:

#!/bin/bash

# Create an associative array to store duplicates
declare -A seen

# Find all .mkv files only (exclude directories)
find /media/seedbox /media/movies /media/tvseries -type f -name "*.mkv" -print0 | \
while IFS= read -r -d '' file; do
    # Get the file's inode and name
    inode=$(stat --format="%i" "$file")
    filename=$(basename "$file")
    
    # If the filename has been seen before
    if [[ -n "${seen[$filename]}" ]]; then
        # Check if the inode is different from the previous one
        if [[ "${seen[$filename]}" != "$inode" ]]; then
            # Output the duplicates with full paths
            echo "Duplicates for \"$filename\":"
            echo "${seen["$filename"]} ${seen["$filename:full_path"]}"
            echo "$inode $file"
            echo
        fi
    else
        seen[$filename]="$inode"
        seen["$filename:full_path"]="$file"
    fi
done

This gave me outputs like:

Duplicates for "episode1.mkv":
1234567 /media/seedbox/sonarr/Serie 1/Season1/episode1.mkv
2345678 /media/tvseries/Serie 1/Season1/episode1.mkv

With awk, it would’ve stopped at /media/seedbox/sonarr/Serie. I’m far from an expert, but Qwen3 performed better and explained everything clearly.

Once I verified the output, I asked for a complete script: compare, delete duplicates, create hardlinks.

Again, ChatGPT disappointed. Despite my requests, it created hardlinks before deleting the duplicates—effectively linking and then deleting the link (though the original is kept). Not helpful.

Quick stopover to Qwen3, RTX 5090 in overdrive, and bam—much better result. Yes, it kept ChatGPT-style emojis, but here it is:

#!/bin/bash

echo "🔍 Step 1: Indexing original files in /media/seedbox..."
declare -A seen

# Index all .mkv files in seedbox
while IFS= read -r -d '' file; do
    filename=$(basename "$file")
    seen["$filename"]="$file"
done < <(find /media/seedbox -type f -name "*.mkv" -print0)

echo "📦 Step 2: Automatically replacing duplicates..."
total_doublons=0
total_ko_saved=0

while IFS= read -r -d '' file; do
    filename=$(basename "$file")
    original="${seen[$filename]}"

    if [[ -n "$original" && "$original" != "$file" ]]; then
        inode_orig=$(stat -c %i "$original")
        inode_dupe=$(stat -c %i "$file")

        if [[ "$inode_orig" != "$inode_dupe" ]]; then
            size_kb=$(du -k "$file" | cut -f1)
            echo "🔁 Replacing:"
            echo "    Duplicate : $file"
            echo "    Original  : $original"
            echo "    Size      : ${size_kb} KB"

            rm "$file" && ln "$original" "$file" && echo "✅ Hardlink created."

            total_doublons=$((total_doublons + 1))
            total_ko_saved=$((total_ko_saved + size_kb))
        fi
    fi
done < <(find /media/movies /media/tvseries -type f -name "*.mkv" -print0)

echo ""
echo "🧾 Summary:"
echo "    🔗 Duplicates replaced by hardlink: $total_doublons"
echo "    💾 Approx. disk space saved: ${total_ko_saved} KB (~$((total_ko_saved / 1024)) MB)"
echo "✅ Done."

So, in conclusion, I:

Learned many Bash subtleties
Learned never to blindly copy-paste a ChatGPT script without understanding and dry-running it
Learned that Qwen on a RTX 5090 is more coherent than ChatGPT-4o on server farms (not even mentioning “normal” ChatGPT)
Learned that even with 100TB of storage, monitoring it would’ve alerted me much earlier to the 12TB of duplicates lying around

Backup of LUKS Headers for Encrypted Disks/Volumes

I recently realized that having just the password is not enough to unlock a LUKS volume after a failure or corruption. I learned how to dump the LUKS headers from disks/volumes and to use the serial numbers along with partition names to accurately identify which header corresponds to which disk/partition (I have 10 of them!).

After struggling to do this manually, I asked Qwen3 (an LLM running on my RTX 5090) to create a script that automates the listing and identification of disks, dumps the headers, and stores them in an encrypted archive ready to be backed up on my backup server.

This script:

Lists and identifies disks with their serial numbers
Lists partitions
Dumps headers into a secured folder under /root
Creates a temporary archive
Prompts for a password
Encrypts the archive with that password
Deletes the unencrypted archive

#!/bin/bash

# Directory where LUKS headers will be backed up
DEST="/root/luks-headers-backup"
mkdir -p "$DEST"

echo "🔍 Searching for LUKS containers on all partitions..."

# Loop through all possible disk partitions (including NVMe and SATA)
for part in /dev/sd? /dev/sd?? /dev/nvme?n?p?; do
    # Skip if the device doesn't exist
    if [ ! -b "$part" ]; then
        continue
    fi

    # Check if the partition is a LUKS encrypted volume
    if cryptsetup isLuks "$part"; then
        # Find the parent disk device (e.g. nvme0n1p4 → nvme0n1)
        disk=$(lsblk -no pkname "$part" | head -n 1)
        full_disk="/dev/$disk"

        # Get the serial number of the parent disk
        SERIAL=$(udevadm info --query=all --name="$full_disk" | grep ID_SERIAL= | cut -d= -f2)
        if [ -z "$SERIAL" ]; then
            SERIAL="unknown"
        fi

        # Extract the partition name (e.g. nvme0n1p4)
        PART_NAME=$(basename "$part")

        # Build the output filename with partition name and disk serial
        OUTPUT="$DEST/luks-header-${PART_NAME}__${SERIAL}.img"

        echo "🔐 Backing up LUKS header of $part (Serial: $SERIAL)..."

        # Backup the LUKS header to the output file
        cryptsetup luksHeaderBackup "$part" --header-backup-file "$OUTPUT"
        if [[ $? -eq 0 ]]; then
            echo "✅ Backup successful → $OUTPUT"
        else
            echo "❌ Backup failed for $part"
        fi
    fi
done

# Create a timestamped compressed tar archive of all header backups
ARCHIVE_NAME="/root/luks-headers-$(date +%Y%m%d_%H%M%S).tar.gz"
echo "📦 Creating archive $ARCHIVE_NAME..."
tar -czf "$ARCHIVE_NAME" -C "$DEST" .

# Encrypt the archive symmetrically using GPG with AES256 cipher
echo "🔐 Encrypting the archive with GPG..."
gpg --symmetric --cipher-algo AES256 "$ARCHIVE_NAME"
if [[ $? -eq 0 ]]; then
    echo "✅ Encrypted archive created: ${ARCHIVE_NAME}.gpg"
    # Remove the unencrypted archive for security
    rm -f "$ARCHIVE_NAME"
else
    echo "❌ Encryption failed"
fi

Don’t forget to back up /etc/fstab and /etc/crypttab as well!

8.6 KiB Raw Blame History Unescape Escape

Bash Scripts

Detecting Duplicates and Replacing Them with Hardlinks

Backup of LUKS Headers for Encrypted Disks/Volumes

8.6 KiB

Raw Blame History