使用 FFmpeg 和 Bash 自动处理视频帧和 HLS

大规模处理体育或赛事视频片段涉及一系列复杂、重复的任务,比如从分割视频到生成缩略图以及上传到云存储。在本文中,我们将使用 FFmpeg、AWS S3 和 PostgreSQL 探索一个强大的基于 Bash 的自动化脚本,以简化这一流程。

1. 脚本初始化和配置

我们从严格的 shell 设置开始,以确保稳健执行:

#!/bin/bash
set -eo pipefail  # 如果命令失败或管道中断则立即退出

该脚本用于处理来自 S3 的多个视频文件夹。以下是 S3 源和目标的定义方式:

declare -a S3_SOURCES=(
    "s3://source/path/"
    "s3://source/path/"
)
S3_DEST="s3://destination/path/"

本地目录配置用于暂存下载、处理和输出:

BASE_DIR="./processing_root"
INPUT_DIR="${BASE_DIR}/input"
PROCESSING_DIR="${BASE_DIR}/processing"
OUTPUT_DIR="${BASE_DIR}/output"
PROCESSED_CLIPS_DIR="${BASE_DIR}/processed_clips"
LOG_FILE="${BASE_DIR}/processing.log"

download_files() {
    local s3_path="$1"
    local local_path="$2"
    echo "Downloading $s3_path to $local_path" >&2
    
    aws s3 cp "$s3_path" "$local_path" || {
        echo "[ERROR] Failed to download $s3_path" >&2
        return 1
    }
}

提示:为每个阶段使用单独的文件夹可确保将关注点清晰地分开,并使调试更容易。

2. 视频处理参数

为支持 HLS 流媒体和多质量传输,定义了 FFmpeg 视频转换配置文件:

SEGMENT_DURATION=2
THUMBNAIL_TIME="00:00:05"

使用关联数组声明三种编码配置文件:

declare -A QUALITIES=(
    ["low"]="scale=-2:360 -crf 28 -preset fast -b:a 64k"
    ["med"]="scale=-2:720 -crf 24 -preset medium -b:a 96k"
    ["high"]="scale=-2:1080 -crf 20 -preset slow -b:a 128k"
)
  • 低质量:轻量级,适合移动流媒体
  • 中等质量:适合一般用途
  • 高品质:高清品质播放

3. 数据库集成

脚本与 PostgreSQL 数据库交互,获取锦标赛和游戏元数据:

DB_HOST="localhost"
DB_USER="postgres"
DB_PASS="****"
DB_NAME="dbname"

这样就可以在开始处理前自动查找和验证游戏文件夹。

使用 query_tournament_idverify_gameupdate_game_s3_path_with_verification 等函数可实现以下功能:

update_game_s3_path_with_verification() {
    # 检查是否设置了 BASE_GAME_NAME
    if [ -z "$BASE_GAME_NAME" ]; then
        echo "[ERROR] BASE_GAME_NAME environment variable is not set." >&2
        echo "Expected format: 'Tournament - Team1 vs Team2 - Month Day. Year'" >&2
        return 1
    fi

    # 检查是否设置了 S3_DEST 
    if [ -z "$S3_DEST" ]; then
        echo "[ERROR] S3_DEST environment variable is not set." >&2
        return 1
    fi

    # 将BASE_GAME_NAME解析为组件
    if ! [[ "$BASE_GAME_NAME" =~ ^(.+)\ -\ (.+)\ vs\ (.+)\ -\ (.+)$ ]]; then
        echo "[ERROR] Invalid BASE_GAME_NAME format. Expected: 'Tournament - Team1 vs Team2 - Month Day. Year'" >&2
        return 1
    fi

    tournament_name="${BASH_REMATCH[1]}"
    team1="${BASH_REMATCH[2]}"
    team2="${BASH_REMATCH[3]}"
    game_date="${BASH_REMATCH[4]}"

    echo "Starting S3 path update..."
    echo "Parsed values from BASE_GAME_NAME:"
    echo "  Tournament: '$tournament_name'"
    echo "  Team 1: '$team1'"
    echo "  Team 2: '$team2'"
    echo "  Date: '$game_date'"
    echo "  S3 Base: '$S3_DEST'"
    echo "  Perspective: '${PERSPECTIVE_NAME:-Not set}'"

    # 获取 tournament ID
    tournament_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
        "SELECT id FROM tbltrmnts WHERE trmnt_name = '$tournament_name';" | tr -d '[:space:]') || {
        echo "[DB ERROR] Failed to query tournament ID" >&2
        return 1
    }
    [ -z "$tournament_id" ] && { echo "[ERROR] Tournament not found: '$tournament_name'" >&2; return 1; }
    echo "Tournament ID: $tournament_id"

    # 格式化日期(兼容 macOS/Linux)
    if [[ "$OSTYPE" == "darwin"* ]]; then
        formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d" 2>/dev/null) || {
            echo "[ERROR] Invalid date format: '$game_date'" >&2
            return 1
        }
    else
        formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d 2>/dev/null) || {
            echo "[ERROR] Invalid date format: '$game_date'" >&2
            return 1
        }
    fi
    echo "Formatted date: $formatted_date"

    # 获取 Game ID
    game_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
        "SELECT g.game_id FROM tblgames AS g
        LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
        LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
        WHERE g.trmnt_id = $tournament_id
        AND t1.team_name = '$team1' AND t2.team_name = '$team2'
        AND DATE(g.match_date) = '$formatted_date';" | tr -d '[:space:]') || {
        echo "[DB ERROR] Failed to query game ID" >&2
        return 1
    }
    [ -z "$game_id" ] && { echo "[ERROR] Game not found (Teams: '$team1' vs '$team2', Date: '$formatted_date')" >&2; return 1; }
    echo "Game ID: $game_id"

    # 生成基础文件夹名称和 S3 路径
    folder_name="$BASE_GAME_NAME"
    full_s3_path="${S3_DEST%/}/${folder_name}"
    
    # 处理透视特定路径
    if [ -n "$PERSPECTIVE_NAME" ]; then
        # 删除空格并将列名转换为小写
        perspective_column_name=$(echo "$PERSPECTIVE_NAME" | tr -d ' ' | tr '[:upper:]' '[:lower:]')
        perspective_s3_path="${full_s3_path}/${PERSPECTIVE_NAME}"
        echo "Perspective-specific S3 Path: $perspective_s3_path"
        echo "Column name for perspective: ${perspective_column_name}_s3_path"
    else
        perspective_s3_path="$full_s3_path"
    fi

    echo "Base S3 Path: $full_s3_path"
    echo "Final S3 Path: $perspective_s3_path"

    # 验证数据库连接
    if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "SELECT 1" >/dev/null 2>&1; then
        echo "[DB ERROR] Failed to connect to database" >&2
        return 1
    fi

    # 确保必需的列存在
    echo "Ensuring required columns exist in database..."
    if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
        SET search_path = ipapolo, public;
        DO \$\$
        BEGIN
            -- Ensure offset column exists as JSONB
            IF NOT EXISTS (
                SELECT 1 FROM information_schema.columns 
                WHERE table_schema = 'public' 
                AND table_name = 'tblgames' 
                AND column_name = 'offset'
            ) THEN
                ALTER TABLE tblgames ADD COLUMN "offset" JSONB;
                RAISE NOTICE 'Added offset column to tblgames';
            ELSE
                BEGIN
                    ALTER TABLE tblgames ALTER COLUMN "offset" TYPE JSONB USING "offset"::jsonb;
                EXCEPTION WHEN others THEN
                    RAISE NOTICE 'Offset column exists but cannot be converted to JSONB';
                END;
            END IF;
            
            -- Ensure duration column exists if this is PGM perspective
            IF '${PERSPECTIVE_NAME:-}' = 'PGM' AND NOT EXISTS (
                SELECT 1 FROM information_schema.columns 
                WHERE table_schema = 'public' 
                AND table_name = 'tblgames' 
                AND column_name = 'duration'
            ) THEN
                ALTER TABLE tblgames ADD COLUMN duration TEXT;
                RAISE NOTICE 'Added duration column to tblgames';
            END IF;
            
            -- Create perspective-specific column if needed (with spaces removed and lowercase)
            IF '${PERSPECTIVE_NAME:-}' != '' AND '${PERSPECTIVE_NAME:-}' != 'PGM' THEN
                DECLARE
                    col_name TEXT := '${perspective_column_name}_s3_path';
                BEGIN
                    IF NOT EXISTS (
                        SELECT 1 FROM information_schema.columns 
                        WHERE table_schema = 'public' 
                        AND table_name = 'tblgames' 
                        AND column_name = col_name
                    ) THEN
                        EXECUTE format('ALTER TABLE tblgames ADD COLUMN %I TEXT', col_name);
                        RAISE NOTICE 'Added % column to tblgames', col_name;
                    END IF;
                END;
            END IF;
        END
        \$\$;
EOF
    then
        echo "[DB ERROR] Failed to ensure required columns exist" >&2
        return 1
    fi

    # 如果设置了 S3_SOURCE,则处理 Offset/JSON 文件
    if [ -n "$S3_SOURCE" ]; then
        echo "Searching for Offset/JSON files in: $S3_SOURCE"
        
        # 搜索模式 1:任何 *Offset.json
        offset_file=$(aws s3 ls "${S3_SOURCE%/}/" | awk '{print $4}' | grep -E 'Offset\.json$' | head -n 1)
        
        if [ -z "$offset_file" ]; then
            # 搜索模式 2: BASE_GAME_NAME - Offset.json
            specific_offset_file="${BASE_GAME_NAME} - Offset.json"
            if aws s3 ls "${S3_SOURCE%/}/$specific_offset_file" >/dev/null 2>&1; then
                offset_file="$specific_offset_file"
            else
                # 搜索模式 3: Any .json file
                offset_file=$(aws s3 ls "${S3_SOURCE%/}/" | awk '{print $4}' | grep -E '\.json$' | head -n 1)
            fi
        fi

        if [ -n "$offset_file" ]; then
            echo "Found JSON file to process: $offset_file"
            temp_file=$(mktemp)
            
            echo "Downloading ${S3_SOURCE%/}/$offset_file..."
            if ! aws s3 cp "${S3_SOURCE%/}/$offset_file" "$temp_file"; then
                echo "[S3 ERROR] Failed to download JSON file" >&2
                rm -f "$temp_file"
                return 1
            fi

            # 验证并处理 JSON
            if ! jq -e . "$temp_file" >/dev/null 2>&1; then
                echo "[ERROR] Invalid JSON in file $offset_file" >&2
                rm -f "$temp_file"
                return 1
            fi

            json_content=$(jq -c . "$temp_file")
            echo "Updating database with JSON content from $offset_file..."
            
            if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
                SET search_path = public;
                UPDATE tblgames 
                SET "offset" = '${json_content//\'/\'\'}'::jsonb
                WHERE game_id = $game_id;
EOF
            then
                echo "[DB ERROR] Failed to update offset column" >&2
                rm -f "$temp_file"
                return 1
            fi

            rm -f "$temp_file"
            echo "Successfully updated offset column with content from $offset_file"
        else
            echo "No suitable JSON files found in $S3_SOURCE"
        fi
    else
        echo "S3_SOURCE not set, skipping JSON file processing"
    fi

    # 处理 PGM 视角案例
    if [ -n "$PERSPECTIVE_NAME" ] && [ "$PERSPECTIVE_NAME" = "PGM" ]; then
        echo "Processing PGM perspective metadata..."
        
        # 检查 PGM 文件夹中的 metadata.json
        metadata_path="${full_s3_path}/PGM/metadata.json"
        echo "Checking for metadata.json at: $metadata_path"
        
        temp_metadata_file=$(mktemp)
        if aws s3 cp "$metadata_path" "$temp_metadata_file" >/dev/null 2>&1; then
            echo "Found metadata.json, processing..."
            
            # 验证 JSON 并提取持续时间值
            if ! jq -e . "$temp_metadata_file" >/dev/null 2>&1; then
                echo "[ERROR] Invalid JSON in metadata.json" >&2
                rm -f "$temp_metadata_file"
                return 1
            fi

            # 如果存在,则提取
            if jq -e '.duration' "$temp_metadata_file" >/dev/null 2>&1; then
                duration_value=$(jq -r '.duration' "$temp_metadata_file")
                echo "Extracted duration value: $duration_value"
                
                # 使用持续时间值更新数据库
                echo "Updating database with duration value..."
                
                if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
                    SET search_path = public;
                    UPDATE tblgames 
                    SET duration = '${duration_value//\'/\'\'}'
                    WHERE game_id = $game_id;
EOF
                then
                    echo "[DB ERROR] Failed to update duration column" >&2
                    rm -f "$temp_metadata_file"
                    return 1
                fi
            else
                echo "No duration field found in metadata.json"
            fi

            # 如果尚未从 S3_SOURCE 中设置偏移量,则检查 metadata.json 中的偏移量
            if [ -z "$json_content" ] && jq -e '.offset' "$temp_metadata_file" >/dev/null 2>&1; then
                offset_content=$(jq -c '.offset' "$temp_metadata_file")
                echo "Found offset in metadata.json, updating database..."
                
                if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
                    SET search_path = public;
                    UPDATE tblgames 
                    SET "offset" = '${offset_content//\'/\'\'}'::jsonb
                    WHERE game_id = $game_id;
EOF
                then
                    echo "[DB ERROR] Failed to update offset from metadata.json" >&2
                    rm -f "$temp_metadata_file"
                    return 1
                fi
            fi

            rm -f "$temp_metadata_file"
        else
            echo "No metadata.json found in PGM folder"
        fi
    fi

    # 根据视角更新数据库中的路径
    if [ -n "$PERSPECTIVE_NAME" ]; then
        if [ "$PERSPECTIVE_NAME" = "PGM" ]; then
            # 对于 PGM,更新主 s3_path
            echo "Updating main s3_path to: $perspective_s3_path"
            update_query="UPDATE tblgames SET s3_path = '${perspective_s3_path//\'/\'\'}' WHERE game_id = $game_id;"
        else
            # 对于其他视角,更新特定视角栏(小写)
            column_name="${perspective_column_name}_s3_path"
            echo "Updating $column_name to: $perspective_s3_path"
            update_query="UPDATE tblgames SET \"$column_name\" = '${perspective_s3_path//\'/\'\'}' WHERE game_id = $game_id;"
        fi

        if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
            SET search_path = public;
            $update_query
EOF
        then
            echo "[DB ERROR] Failed to update path for perspective $PERSPECTIVE_NAME" >&2
            return 1
        fi
    else
        # 未指定视角,更新主 s3_path
        echo "Updating main s3_path to: $full_s3_path"
        if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
            SET search_path = public;
            UPDATE tblgames 
            SET s3_path = '${full_s3_path//\'/\'\'}'
            WHERE game_id = $game_id;
EOF
        then
            echo "[DB ERROR] Failed to update s3_path" >&2
            return 1
        fi
    fi

    # 验证更新
    echo "Verifying database update for game ID $game_id:"
    if [ -n "$PERSPECTIVE_NAME" ]; then
        if [ "$PERSPECTIVE_NAME" = "PGM" ]; then
            # 显示 PGM 相关列
            PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
                SET search_path = public;
                SELECT 
                    game_id,
                    s3_path,
                    jsonb_typeof("offset") as offset_type,
                    "offset" as offset_content,
                    duration
                FROM tblgames
                WHERE game_id = $game_id;
EOF
        else
            # 显示特定视角栏(小写)
            column_name="${perspective_column_name}_s3_path"
            PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
                SET search_path = public;
                SELECT 
                    game_id,
                    "$column_name",
                    jsonb_typeof("offset") as offset_type
                FROM tblgames
                WHERE game_id = $game_id;
EOF
        fi
    else
        # 未指定视角时显示基本信息
        PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
            SET search_path = public;
            SELECT 
                game_id,
                s3_path,
                jsonb_typeof("offset") as offset_type
            FROM tblgames
            WHERE game_id = $game_id;
EOF
    fi

    echo "Update completed successfully"
}
  • 将文件名与锦标赛/游戏记录匹配
  • 在成功处理后将 S3 路径更新回数据库

安全说明:避免在生产环境中对凭证进行硬编码。请使用环境变量或 AWS 参数存储。

4. 用于组织素材的视角模式

视频片段可能来自多个摄像机视角。为了在处理过程中正确组织这些媒体,脚本使用了映射机制:

declare -A PERSPECTIVE_PATTERNS=(
    ["Drone"]="Drone"
    ["Endzone 1"]="Endzone_1"
    ["Endzone 2"]="Endzone_2"
    ["PGM"]="PGM"
    ["High"]="high"
    ["Low"]="low"
    ["Medium"]="medium"
    ["Sideline"]="Sideline"
)

该关联数组可将文件名中的已知视频视角(如 Drone、Enddzone 1)映射到用于结构化输出和下游处理的标准化内部文件夹名称。

例如:

如果文件名包含"Drone",则处理后的输出将组织在下processed_clips/Drone

5. 集中错误处理

一个小型实用程序可确保脚本在出错时优雅终止:

error_exit() {
    echo "[ERROR] $1" >&2
    exit 1
}

这样就很容易在脚本的所有部分中一致地处理故障:

ffmpeg ... || error_exit "FFmpeg failed during encoding"

重要性:如果没有集中处理,调试长时间运行的脚本中的故障将成为一场噩梦。这个简单的功能提高了可追溯性和弹性。

6. 依赖项检查和自动安装程序

在脚本执行任何繁重工作之前,它会验证所有关键工具是否可用:

# ----- 依赖项检查 -----
check_dependencies() {
    local missing=()
    for cmd in aws ffmpeg ffprobe psql; do
        if ! command -v "$cmd" >/dev/null 2>&1; then
            missing+=("$cmd")
        fi
    done

    if [ ${#missing[@]} -gt 0 ]; then
        echo "Installing missing dependencies: ${missing[*]}"
        if [[ "$OSTYPE" == "darwin"* ]]; then
            # macOS
            if ! command -v brew >/dev/null; then
                /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
                echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
                source ~/.zshrc
            fi
            brew install "${missing[@]}" || error_exit "Failed to install dependencies"
        else
            # Linux
            sudo apt-get update && sudo apt-get install -y "${missing[@]}" || {
                error_exit "Failed to install dependencies"
            }
        fi
    fi
}

如果缺少任何内容,脚本会自动安装它们:

  • 在 macOS 上:如果需要,会安装 Homebrew,然后使用它来安装缺少的软件包。
  • 在 Linux 上:使用apt-get
if [[ "$OSTYPE" == "darwin"* ]]; then
    # 使用 Homebrew 安装 macOS
else
    # 使用 apt-get 安装 Linux 
fi

自我修复设置:此功能在新环境(CI/CD 运行器、EC2 实例、开发人员笔记本电脑)中特别有用,使脚本即插即用。

7. 数据库函数:将元数据与现实世界联系起来

为了确保每个处理过的视频片段都对应一个有效的锦标赛和比赛,脚本在继续执行之前会直接查询 PostgreSQL 数据库。这对于将正确的比赛 ID 与生成的缩略图、片段和元数据等资产关联起来至关重要。

query_tournament_id:按名称查找锦标赛

该函数根据锦标赛名称检索锦标赛的唯一 ID,并在所有后续操作中用作外键:

query_tournament_id() {
    local tournament_name="$1"
    echo "Querying tournament ID for: '$tournament_name'" >&2
    
    result=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
        "SELECT id FROM tbltrmnts WHERE trmnt_name = '$tournament_name';" | tr -d '[:space:]') || {
        echo "[DB ERROR] Failed to query tournament ID" >&2
        return 1
    }

    if [ -z "$result" ]; then
        echo "[WARNING] Tournament not found: '$tournament_name'" >&2
        return 1
    fi
    
    echo "$result"
}

目的:确保每个文件夹和文件名都正确地与经过验证的锦标赛相关联。

verify_game:处理前确保游戏存在

一旦锦标赛 ID 可用,脚本就会验证比赛(通过球队名称日期标识)是否存在于数据库中:

verify_game() {
    local tournament_id="$1"
    local team1="$2"
    local team2="$3"
    local game_date="$4"
    
    echo "Verifying game: $team1 vs $team2 on $game_date (Tournament ID: $tournament_id)" >&2
    
    # 转换日期格式
    if [[ "$OSTYPE" == "darwin"* ]]; then
        # macOS 日期解析
        formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d" 2>/dev/null) || {
            echo "[ERROR] Invalid date format: $game_date" >&2
            return 1
        }
    else
        # Linux 日期解析
        formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d 2>/dev/null) || {
            echo "[ERROR] Invalid date format: $game_date" >&2
            return 1
        }
    fi
    
    query="SELECT g.* FROM tblgames AS g
           LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
           LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
           WHERE g.trmnt_id = $tournament_id
           AND t1.team_name = '$team1' AND t2.team_name = '$team2'
           AND DATE(g.match_date) = '$formatted_date';"
    
    game_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c "$query" | tr -d '[:space:]') || {
        echo "[DB ERROR] Failed to query game ID" >&2
        return 1
    }
    
    if [ -n "$game_id" ]; then
        echo "Found game ID: $game_id" >&2
        echo "$game_id"
        return 0
    else
        echo "[WARNING] Game not found in database" >&2
        return 1
    fi
}

跨平台日期转换

由于 macOS 和 Linux 处理date格式的方式不同,因此该脚本同时处理这两种格式:

if [[ "$OSTYPE" == "darwin"* ]]; then
    formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d")
else
    formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d)
fi

SQL JOIN 逻辑

SELECT g.* FROM tblgames AS g
LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
WHERE g.trmnt_id = $tournament_id
AND t1.team_name = '$team1' AND t2.team_name = '$team2'
AND DATE(g.match_date) = '$formatted_date';

这种强大的查询结构可确保:

  • 匹配到正确的锦标赛ID
  • 两支球队均存在且名称匹配
  • 比赛日期准确(即使格式不一致)

如果找到匹配的 game_id,则函数返回

if [ -n "$game_id" ]; then
    echo "Found game ID: $game_id"
    echo "$game_id"
else
    echo "[WARNING] Game not found in database"
fi

注意:如果没有找到匹配项,脚本会记录警告并跳过对该文件夹的进一步处理,以防止意外的数据污染。

8. 文件处理功能:视频转换的核心

源视频经过验证后,脚本将进入重载阶段:将原始输入视频转换为 HLS 片段、MP4 片段、缩略图和元数据。此转换流程由两个函数驱动:process_full_videoprocess_clip

process_full_video() {
    local input_file="$1"
    local output_dir="$2"
    
    echo "Processing full video: $input_file" >&2
    
    # 如果输出目录不存在,则创建它
    mkdir -p "$output_dir" || {
        echo "[ERROR] Failed to create output directory: $output_dir" >&2
        return 1
    }
    
    # 创建片段目录
    local segments_dir="${output_dir}/segments"
    mkdir -p "$segments_dir" || {
        echo "[ERROR] Failed to create segments directory: $segments_dir" >&2
        return 1
    }
    
    # 获取 duration
    local duration
    duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file" | awk '{print int($1)}') || {
        echo "[ERROR] Failed to get duration for $input_file" >&2
        return 1
    }
    
    # 为每种质量生成缩略图
    generate_thumbnails "$input_file" "$output_dir/thumbnail"
    
    # 处理每个质量级别
    for quality in "${!QUALITIES[@]}"; do
        local params="${QUALITIES[$quality]}"
        
        echo "Processing $quality quality..." >&2
        ffmpeg -i "$input_file" \
            -vf "$(echo "$params" | grep -o 'scale=[^ ]*')" \
            -c:v libx264 $(echo "$params" | grep -o '\-crf [^ ]*') \
            -preset $(echo "$params" | grep -o '\-preset [^ ]*' | cut -d' ' -f2) \
            -c:a aac $(echo "$params" | grep -o '\-b:a [^ ]*') \
            -f hls -hls_time $SEGMENT_DURATION -hls_playlist_type vod \
            -hls_segment_filename "$segments_dir/${quality}_%03d.ts" \
            -hls_base_url "./segments/" \
            "$output_dir/${quality}.m3u8" </dev/null || {
                echo "[ERROR] Failed to process $quality version" >&2
                continue
            }
    done
    
    # 创建 metadata
    echo "{\"duration\":$duration}" > "$output_dir/metadata.json"
}

关键操作

创建输出目录结构

  • output_dir: 目标 HLS 目录
  • segments_dir: 保存 .ts 片段的子目录

提取视频时长

duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file" | awk '{print int($1)}') || {
        echo "[ERROR] Failed to get duration for $input_file" >&2
        return 1
    }Uses ffprobe to calculate length in seconds
  • 存储元数据中使用的持续时间

生成多质量 HLS 流循环

遍历lowmedhigh配置文件:

 ffmpeg -i "$input_file" \
            -vf "$(echo "$params" | grep -o 'scale=[^ ]*')" \
            -c:v libx264 $(echo "$params" | grep -o '\-crf [^ ]*') \
            -preset $(echo "$params" | grep -o '\-preset [^ ]*' | cut -d' ' -f2) \
            -c:a aac $(echo "$params" | grep -o '\-b:a [^ ]*') \
            -f hls -hls_time $SEGMENT_DURATION -hls_playlist_type vod \
            -hls_segment_filename "$segments_dir/${quality}_%03d.ts" \
            -hls_base_url "./segments/" \
            "$output_dir/${quality}.m3u8" </dev/null || {
                echo "[ERROR] Failed to process $quality version" >&2
                continue
            }

产出:

  • .m3u8播放列表文件
  • .ts用于流式传输的分段文件

生成缩略图

# 为每种质量生成缩略图
    generate_thumbnails "$input_file" "$output_dir/thumbnail"


generate_thumbnails() {
    local input_file="$1"
    local output_prefix="$2"
    
    echo "Generating thumbnails for $input_file" >&2
    
    for quality in "${!QUALITIES[@]}"; do
        local scale=$(echo "${QUALITIES[$quality]}" | grep -o 'scale=[^ ]*')
        local output_file="${output_prefix}_${quality}.jpg"
        
        ffmpeg -ss "$THUMBNAIL_TIME" -i "$input_file" \
            -vf "$scale" \
            -vframes 1 \
            -q:v 2 \
            "$output_file" </dev/null || {
                echo "[WARNING] Failed to generate $quality thumbnail" >&2
                continue
            }
    done
}
    
    # 创建 metadata
    echo "{\"duration\":$duration}" > "$output_dir/metadata.json"

创建元数据

echo  "{\"duration\": $duration }" > " $output_dir /metadata.json"

目的:通过提供多种分辨率和比特率以及精确的分段和干净的文件夹结构,实现自适应视频播放。

process_clip:使用动态质量分割集锦

用于从整场比赛中创建单独的集锦片段。

process_clip() {
    local input_file="$1"
    local clip_prefix="$2"
    local start_sec="$3"
    local duration_sec="$4"

    echo "Processing clip $clip_prefix (Start: ${start_sec}s, Duration: ${duration_sec}s)" >&2

    # 输出文件路径
    local output_mp4="${PROCESSED_CLIPS_DIR}/${clip_prefix}.mp4"
    local output_thumbnail="${PROCESSED_CLIPS_DIR}/${clip_prefix}.jpg"

    # 清理所有现有文件
    rm -f "$output_mp4" "$output_thumbnail"

    # 根据开始时间计算可变质量的 CRF 值
    local crf=$(( 22 + $(printf "%.0f" "$start_sec") % 6 ))
    local resolution="-vf scale=trunc(iw/2)*2:trunc(ih/2)*2"  # Ensure even dimensions

    # 使用正确的编码和可变质量将剪辑处理为 MP4
    ffmpeg -hide_banner -loglevel error -nostdin \
        -ss "$start_sec" \
        -i "$input_file" \
        -t "$duration_sec" \
        -c:v libx264 -crf "$crf" -preset fast \
        -profile:v main -pix_fmt yuv420p \
        -movflags +faststart \
        -c:a aac -b:a 128k \
        $resolution \
        -y "$output_mp4" </dev/null || {
        echo "[ERROR] FFmpeg failed to process clip $clip_prefix" >&2
        return 1
    }

    # 验证输出文件
    if [ ! -s "$output_mp4" ]; then
        echo "[ERROR] Output file is empty: $output_mp4" >&2
        return 1
    fi

    # 从剪辑中的几秒生成缩略图
    local thumbnail_time=$(awk -v start="$start_sec" 'BEGIN { print (start + 5) }')

    ffmpeg -hide_banner -loglevel error -nostdin \
        -ss "$thumbnail_time" \
        -i "$input_file" \
        -t 1 \
        -vf "scale=320:-1" \
        -vframes 1 \
        -q:v 2 \
        -y "$output_thumbnail" </dev/null || {
        echo "[WARNING] Failed to generate thumbnail for clip $clip_prefix at offset +5s, trying fallback..." >&2

        # 回退:从开始时间生成
        ffmpeg -hide_banner -loglevel error -nostdin \
            -ss "$start_sec" \
            -i "$input_file" \
            -t 1 \
            -vf "scale=320:-1" \
            -vframes 1 \
            -q:v 2 \
            -y "$output_thumbnail" </dev/null || true
    }

    echo "✅ Finished processing clip $clip_prefix (CRF=$crf)" >&2
    return 0
}

特点:

唯一输出路径

output_mp4 = “ ${PROCESSED_CLIPS_DIR} / ${clip_prefix} .mp4”
 output_thumbnail = “ ${PROCESSED_CLIPS_DIR} / ${clip_prefix} .jpg”

用于自适应压缩的动态 CRF

local crf=$(( 22 + $( printf  "%.0f"  "$start_sec" ) % 6 )) 
local resolution= "-vf scale=trunc(iw/2)*2:trunc(ih/2)*2"   # 确保尺寸均匀
  • CRF(恒定速率因子)随剪辑开始时间而增加
  • 减小后续剪辑的文件大小

具有回退逻辑的剪辑编码

# 使用适当的编码和可变质量将剪辑处理为 MP4
    ffmpeg -hide_banner -loglevel error -nostdin \
        -ss "$start_sec" \
        -i "$input_file" \
        -t "$duration_sec" \
        -c:v libx264 -crf "$crf" -preset fast \
        -profile:v main -pix_fmt yuv420p \
        -movflags +faststart \
        -c:a aac -b:a 128k \
        $resolution \
        -y "$output_mp4" </dev/null || {
        echo "[ERROR] FFmpeg failed to process clip $clip_prefix" >&2
        return 1
    }
  • 从源头切开片段
  • 应用均匀分辨率尺寸(iw/2*2)以避免编解码器错误
  • AAC 音频,+faststart用于网络兼容性

缩略图生成

尝试在 start + 5s 处捕获缩略图,必要时回退到start

local thumbnail_time=$(awk -v start="$start_sec" 'BEGIN { print (start + 5) }')

    ffmpeg -hide_banner -loglevel error -nostdin \
        -ss "$thumbnail_time" \
        -i "$input_file" \
        -t 1 \
        -vf "scale=320:-1" \
        -vframes 1 \
        -q:v 2 \
        -y "$output_thumbnail" </dev/null || {
        echo "[WARNING] Failed to generate thumbnail for clip $clip_prefix at offset +5s, trying fallback..." >&2

        # 回退:从开始时间生成
        ffmpeg -hide_banner -loglevel error -nostdin \
            -ss "$start_sec" \
            -i "$input_file" \
            -t 1 \
            -vf "scale=320:-1" \
            -vframes 1 \
            -q:v 2 \
            -y "$output_thumbnail" </dev/null || true
    }
  • 错误记录和验证
  • 验证输出文件不为空

记录警告并在出现缩略图错误时正常继续

结果:每个剪辑都成为一个独立的、针对网络优化的 .mp4 文件,并附带 .jpg 缩略图 – 非常适合精彩片段、卷轴和事件回顾。

9. 主要处理函数 —process_single_video

此功能是处理每个视频文件的中央协调器。它提取元数据,对其进行适当的转换,并准备上传。

process_single_video() {
    local MP4_FILE="$1"
    local BASE_NAME="${MP4_FILE%.mp4}"
    local LOCAL_MP4="${PROCESSING_DIR}/${MP4_FILE}"
    local OUTPUT_GAME_DIR="${OUTPUT_DIR}/${BASE_NAME}"
    local DESTINATION_FOLDER="$2"  # Passed as parameter
    
    echo -e "\n=== Processing video: $BASE_NAME ==="

    # 1. 查找基本游戏名称(第三个连字符之前的所有内容)
    local BASE_GAME_NAME
    BASE_GAME_NAME=$(find_base_game_name "$MP4_FILE") || {
        echo "[ERROR] Failed to determine base game name for $MP4_FILE" >&2
        return 1
    }
    
    echo "Base game name: $BASE_GAME_NAME" >&2

    # 2. 如果这是透视视频,则获取透视
    local PERSPECTIVE_NAME=""
    PERSPECTIVE_NAME=$(get_perspective_name "$MP4_FILE") || {
        echo "Not a perspective video or perspective not recognized" >&2
    }

    # 3. 确定 S3 目标路径
    local S3_DEST_PATH="${S3_DEST}${DESTINATION_FOLDER}/"
    
    # 对于透视视频,添加透视子文件夹
    if [ -n "$PERSPECTIVE_NAME" ]; then
        S3_DEST_PATH="${S3_DEST_PATH}${PERSPECTIVE_PATTERNS[$PERSPECTIVE_NAME]}/"
    fi

    # [其他处理步骤...]
}

分步分解

1. 提取基础游戏名称

BASE_GAME_NAME=$(find_base_game_name " $MP4_FILE " ) 

find_base_game_name () { 
    local filename= " $1 " 
    echo  "Finding base game name for: $filename " >&2 
    
    # 使用连字符分隔符将文件名拆分为几部分
    IFS= '-'  read -ra parts <<< " $filename " 
    
    # 取前 3 个部分(第三个连字符之前)并将它们连接起来
    base_name=$(IFS= '-' ; echo  " ${parts[*]:0:3} " | xargs) 
    base_name= " ${base_name%.mp4} "   # 如果存在则删除 .mp4 
    
    echo  " $base_name "
 }

2. 识别摄像机视角

PERSPECTIVE_NAME=$(get_perspective_name "$MP4_FILE")

get_perspective_name() {
    local filename="$1"
    echo "Getting perspective name for: $filename" >&2
    
    # 使用连字符分隔符将文件名拆分为几部分
    IFS='-' read -ra parts <<< "$filename"
    
    # 如果有超过 3 个部分,则透视图位于第三个连字符之后
    if [ ${#parts[@]} -ge 4 ]; then
        perspective=$(IFS='-'; echo "${parts[*]:3}" | xargs)
        perspective="${perspective%.mp4}"  # Remove .mp4 if present
        
        # 在透视图模式中寻找匹配的模式
        for pattern in "${!PERSPECTIVE_PATTERNS[@]}"; do
            if [[ "$perspective" == *"$pattern"* ]]; then
                echo "$pattern"
                return 0
            fi
        done
        
        # 如果没有完全匹配,则返回原始视角文本
        echo "$perspective"
        return 0
    fi
    
    # 未找到视角
    echo ""
    return 1
}

3. 设置目标 S3 路径

S3_DEST_PATH = “ ${S3_DEST} ${DESTINATION_FOLDER} /”
  • 将全局目标根目录 (S3_DEST) 与提取的 DESTINATION_FOLDER 相结合。
  • 如果视频是透视图,则添加子文件夹。

4. 协调管道

在初始元数据提取后,该功能(完整形式)将继续进行以下工作:

使用以下方法对数据库进行验证

  • query_tournament_id
  • verify_game

传输文件:从输入到PROCESSING_DIR

运行转换

  • process_full_video用于 HLS 转换
  • process_clip重点推荐(如适用)

上传到 S3:将处理过的文件推送到正确的 S3 目的地

记录元数据:将持续时间和标识符写入metadata.json

清理:删除临时文件和目录以释放空间

10. 初始化和主要执行

本节定义了整个脚本的入口点。它准备环境、验证需求,然后处理来自 S3 的每个视频源。

初始化:init()

init() {
    # 创建目录
    mkdir -p "$BASE_DIR" "$INPUT_DIR" "$PROCESSING_DIR" "$OUTPUT_DIR" "$PROCESSED_CLIPS_DIR"
# 初始化日志记录
    exec > >(tee -a "$LOG_FILE") 2>&1
    echo -e "\n=== Script started at $(date) ==="
# 检查依赖项
    check_dependencies
# 验证 AWS 凭证
    aws sts get-caller-identity >/dev/null || {
        error_exit "AWS credentials not configured properly"
    }
}

主要功能:main()

# ----- 主要执行 -----
main() {
    init
    
    # 按顺序处理每个 S3 源
    for S3_SOURCE in "${S3_SOURCES[@]}"; do
        process_source "$S3_SOURCE" || {
            echo "[WARNING] Failed to process source: $S3_SOURCE"
            continue
        }
    done

    echo -e "\n=== Script completed at $(date) ==="
    echo "Log saved to $LOG_FILE"
}

这里会发生什么:

  • 初始化所有内容(init)
  • 遍历 S3_SOURCES 数组,其中可能包括多个锦标赛或游戏
  • 为每个路径调用 process_source:
    • 下载文件
    • 解析名称和元数据
    • 处理为 HLS 和剪辑
    • 上传到 S3
  • 优雅地处理失败 – 记录警告,但继续下一个源
  • 记录结束时间,以便审计和诊断

开始执行

main

结尾处这行简单而有力的文字触发了整个工作流程。

总结

initmain 函数将脚本整合成一个简洁、模块化和容错的流水线。这种结构:

  • 便于调试
  • 支持未来扩展(如并行处理)
  • 为日常自动化任务的生产做好准备

作者:Syed Muhammad Ali

本文来自作者投稿,版权归原作者所有。如需转载,请注明出处:https://www.nxrte.com/jishu/60403.html

(0)

相关推荐

发表回复

登录后才能评论