大规模处理体育或赛事视频片段涉及一系列复杂、重复的任务,比如从分割视频到生成缩略图以及上传到云存储。在本文中,我们将使用 FFmpeg、AWS S3 和 PostgreSQL 探索一个强大的基于 Bash 的自动化脚本,以简化这一流程。
1. 脚本初始化和配置
我们从严格的 shell 设置开始,以确保稳健执行:
#!/bin/bash
set -eo pipefail # 如果命令失败或管道中断则立即退出
该脚本用于处理来自 S3 的多个视频文件夹。以下是 S3 源和目标的定义方式:
declare -a S3_SOURCES=(
"s3://source/path/"
"s3://source/path/"
)
S3_DEST="s3://destination/path/"
本地目录配置用于暂存下载、处理和输出:
BASE_DIR="./processing_root"
INPUT_DIR="${BASE_DIR}/input"
PROCESSING_DIR="${BASE_DIR}/processing"
OUTPUT_DIR="${BASE_DIR}/output"
PROCESSED_CLIPS_DIR="${BASE_DIR}/processed_clips"
LOG_FILE="${BASE_DIR}/processing.log"
download_files() {
local s3_path="$1"
local local_path="$2"
echo "Downloading $s3_path to $local_path" >&2
aws s3 cp "$s3_path" "$local_path" || {
echo "[ERROR] Failed to download $s3_path" >&2
return 1
}
}
提示:为每个阶段使用单独的文件夹可确保将关注点清晰地分开,并使调试更容易。
2. 视频处理参数
为支持 HLS 流媒体和多质量传输,定义了 FFmpeg 视频转换配置文件:
SEGMENT_DURATION=2
THUMBNAIL_TIME="00:00:05"
使用关联数组声明三种编码配置文件:
declare -A QUALITIES=(
["low"]="scale=-2:360 -crf 28 -preset fast -b:a 64k"
["med"]="scale=-2:720 -crf 24 -preset medium -b:a 96k"
["high"]="scale=-2:1080 -crf 20 -preset slow -b:a 128k"
)
- 低质量:轻量级,适合移动流媒体
- 中等质量:适合一般用途
- 高品质:高清品质播放
3. 数据库集成
脚本与 PostgreSQL 数据库交互,获取锦标赛和游戏元数据:
DB_HOST="localhost"
DB_USER="postgres"
DB_PASS="****"
DB_NAME="dbname"
这样就可以在开始处理前自动查找和验证游戏文件夹。
使用 query_tournament_id、verify_game 和 update_game_s3_path_with_verification 等函数可实现以下功能:
update_game_s3_path_with_verification() {
# 检查是否设置了 BASE_GAME_NAME
if [ -z "$BASE_GAME_NAME" ]; then
echo "[ERROR] BASE_GAME_NAME environment variable is not set." >&2
echo "Expected format: 'Tournament - Team1 vs Team2 - Month Day. Year'" >&2
return 1
fi
# 检查是否设置了 S3_DEST
if [ -z "$S3_DEST" ]; then
echo "[ERROR] S3_DEST environment variable is not set." >&2
return 1
fi
# 将BASE_GAME_NAME解析为组件
if ! [[ "$BASE_GAME_NAME" =~ ^(.+)\ -\ (.+)\ vs\ (.+)\ -\ (.+)$ ]]; then
echo "[ERROR] Invalid BASE_GAME_NAME format. Expected: 'Tournament - Team1 vs Team2 - Month Day. Year'" >&2
return 1
fi
tournament_name="${BASH_REMATCH[1]}"
team1="${BASH_REMATCH[2]}"
team2="${BASH_REMATCH[3]}"
game_date="${BASH_REMATCH[4]}"
echo "Starting S3 path update..."
echo "Parsed values from BASE_GAME_NAME:"
echo " Tournament: '$tournament_name'"
echo " Team 1: '$team1'"
echo " Team 2: '$team2'"
echo " Date: '$game_date'"
echo " S3 Base: '$S3_DEST'"
echo " Perspective: '${PERSPECTIVE_NAME:-Not set}'"
# 获取 tournament ID
tournament_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
"SELECT id FROM tbltrmnts WHERE trmnt_name = '$tournament_name';" | tr -d '[:space:]') || {
echo "[DB ERROR] Failed to query tournament ID" >&2
return 1
}
[ -z "$tournament_id" ] && { echo "[ERROR] Tournament not found: '$tournament_name'" >&2; return 1; }
echo "Tournament ID: $tournament_id"
# 格式化日期(兼容 macOS/Linux)
if [[ "$OSTYPE" == "darwin"* ]]; then
formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d" 2>/dev/null) || {
echo "[ERROR] Invalid date format: '$game_date'" >&2
return 1
}
else
formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d 2>/dev/null) || {
echo "[ERROR] Invalid date format: '$game_date'" >&2
return 1
}
fi
echo "Formatted date: $formatted_date"
# 获取 Game ID
game_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
"SELECT g.game_id FROM tblgames AS g
LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
WHERE g.trmnt_id = $tournament_id
AND t1.team_name = '$team1' AND t2.team_name = '$team2'
AND DATE(g.match_date) = '$formatted_date';" | tr -d '[:space:]') || {
echo "[DB ERROR] Failed to query game ID" >&2
return 1
}
[ -z "$game_id" ] && { echo "[ERROR] Game not found (Teams: '$team1' vs '$team2', Date: '$formatted_date')" >&2; return 1; }
echo "Game ID: $game_id"
# 生成基础文件夹名称和 S3 路径
folder_name="$BASE_GAME_NAME"
full_s3_path="${S3_DEST%/}/${folder_name}"
# 处理透视特定路径
if [ -n "$PERSPECTIVE_NAME" ]; then
# 删除空格并将列名转换为小写
perspective_column_name=$(echo "$PERSPECTIVE_NAME" | tr -d ' ' | tr '[:upper:]' '[:lower:]')
perspective_s3_path="${full_s3_path}/${PERSPECTIVE_NAME}"
echo "Perspective-specific S3 Path: $perspective_s3_path"
echo "Column name for perspective: ${perspective_column_name}_s3_path"
else
perspective_s3_path="$full_s3_path"
fi
echo "Base S3 Path: $full_s3_path"
echo "Final S3 Path: $perspective_s3_path"
# 验证数据库连接
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -c "SELECT 1" >/dev/null 2>&1; then
echo "[DB ERROR] Failed to connect to database" >&2
return 1
fi
# 确保必需的列存在
echo "Ensuring required columns exist in database..."
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = ipapolo, public;
DO \$\$
BEGIN
-- Ensure offset column exists as JSONB
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'tblgames'
AND column_name = 'offset'
) THEN
ALTER TABLE tblgames ADD COLUMN "offset" JSONB;
RAISE NOTICE 'Added offset column to tblgames';
ELSE
BEGIN
ALTER TABLE tblgames ALTER COLUMN "offset" TYPE JSONB USING "offset"::jsonb;
EXCEPTION WHEN others THEN
RAISE NOTICE 'Offset column exists but cannot be converted to JSONB';
END;
END IF;
-- Ensure duration column exists if this is PGM perspective
IF '${PERSPECTIVE_NAME:-}' = 'PGM' AND NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'tblgames'
AND column_name = 'duration'
) THEN
ALTER TABLE tblgames ADD COLUMN duration TEXT;
RAISE NOTICE 'Added duration column to tblgames';
END IF;
-- Create perspective-specific column if needed (with spaces removed and lowercase)
IF '${PERSPECTIVE_NAME:-}' != '' AND '${PERSPECTIVE_NAME:-}' != 'PGM' THEN
DECLARE
col_name TEXT := '${perspective_column_name}_s3_path';
BEGIN
IF NOT EXISTS (
SELECT 1 FROM information_schema.columns
WHERE table_schema = 'public'
AND table_name = 'tblgames'
AND column_name = col_name
) THEN
EXECUTE format('ALTER TABLE tblgames ADD COLUMN %I TEXT', col_name);
RAISE NOTICE 'Added % column to tblgames', col_name;
END IF;
END;
END IF;
END
\$\$;
EOF
then
echo "[DB ERROR] Failed to ensure required columns exist" >&2
return 1
fi
# 如果设置了 S3_SOURCE,则处理 Offset/JSON 文件
if [ -n "$S3_SOURCE" ]; then
echo "Searching for Offset/JSON files in: $S3_SOURCE"
# 搜索模式 1:任何 *Offset.json
offset_file=$(aws s3 ls "${S3_SOURCE%/}/" | awk '{print $4}' | grep -E 'Offset\.json$' | head -n 1)
if [ -z "$offset_file" ]; then
# 搜索模式 2: BASE_GAME_NAME - Offset.json
specific_offset_file="${BASE_GAME_NAME} - Offset.json"
if aws s3 ls "${S3_SOURCE%/}/$specific_offset_file" >/dev/null 2>&1; then
offset_file="$specific_offset_file"
else
# 搜索模式 3: Any .json file
offset_file=$(aws s3 ls "${S3_SOURCE%/}/" | awk '{print $4}' | grep -E '\.json$' | head -n 1)
fi
fi
if [ -n "$offset_file" ]; then
echo "Found JSON file to process: $offset_file"
temp_file=$(mktemp)
echo "Downloading ${S3_SOURCE%/}/$offset_file..."
if ! aws s3 cp "${S3_SOURCE%/}/$offset_file" "$temp_file"; then
echo "[S3 ERROR] Failed to download JSON file" >&2
rm -f "$temp_file"
return 1
fi
# 验证并处理 JSON
if ! jq -e . "$temp_file" >/dev/null 2>&1; then
echo "[ERROR] Invalid JSON in file $offset_file" >&2
rm -f "$temp_file"
return 1
fi
json_content=$(jq -c . "$temp_file")
echo "Updating database with JSON content from $offset_file..."
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = public;
UPDATE tblgames
SET "offset" = '${json_content//\'/\'\'}'::jsonb
WHERE game_id = $game_id;
EOF
then
echo "[DB ERROR] Failed to update offset column" >&2
rm -f "$temp_file"
return 1
fi
rm -f "$temp_file"
echo "Successfully updated offset column with content from $offset_file"
else
echo "No suitable JSON files found in $S3_SOURCE"
fi
else
echo "S3_SOURCE not set, skipping JSON file processing"
fi
# 处理 PGM 视角案例
if [ -n "$PERSPECTIVE_NAME" ] && [ "$PERSPECTIVE_NAME" = "PGM" ]; then
echo "Processing PGM perspective metadata..."
# 检查 PGM 文件夹中的 metadata.json
metadata_path="${full_s3_path}/PGM/metadata.json"
echo "Checking for metadata.json at: $metadata_path"
temp_metadata_file=$(mktemp)
if aws s3 cp "$metadata_path" "$temp_metadata_file" >/dev/null 2>&1; then
echo "Found metadata.json, processing..."
# 验证 JSON 并提取持续时间值
if ! jq -e . "$temp_metadata_file" >/dev/null 2>&1; then
echo "[ERROR] Invalid JSON in metadata.json" >&2
rm -f "$temp_metadata_file"
return 1
fi
# 如果存在,则提取
if jq -e '.duration' "$temp_metadata_file" >/dev/null 2>&1; then
duration_value=$(jq -r '.duration' "$temp_metadata_file")
echo "Extracted duration value: $duration_value"
# 使用持续时间值更新数据库
echo "Updating database with duration value..."
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = public;
UPDATE tblgames
SET duration = '${duration_value//\'/\'\'}'
WHERE game_id = $game_id;
EOF
then
echo "[DB ERROR] Failed to update duration column" >&2
rm -f "$temp_metadata_file"
return 1
fi
else
echo "No duration field found in metadata.json"
fi
# 如果尚未从 S3_SOURCE 中设置偏移量,则检查 metadata.json 中的偏移量
if [ -z "$json_content" ] && jq -e '.offset' "$temp_metadata_file" >/dev/null 2>&1; then
offset_content=$(jq -c '.offset' "$temp_metadata_file")
echo "Found offset in metadata.json, updating database..."
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = public;
UPDATE tblgames
SET "offset" = '${offset_content//\'/\'\'}'::jsonb
WHERE game_id = $game_id;
EOF
then
echo "[DB ERROR] Failed to update offset from metadata.json" >&2
rm -f "$temp_metadata_file"
return 1
fi
fi
rm -f "$temp_metadata_file"
else
echo "No metadata.json found in PGM folder"
fi
fi
# 根据视角更新数据库中的路径
if [ -n "$PERSPECTIVE_NAME" ]; then
if [ "$PERSPECTIVE_NAME" = "PGM" ]; then
# 对于 PGM,更新主 s3_path
echo "Updating main s3_path to: $perspective_s3_path"
update_query="UPDATE tblgames SET s3_path = '${perspective_s3_path//\'/\'\'}' WHERE game_id = $game_id;"
else
# 对于其他视角,更新特定视角栏(小写)
column_name="${perspective_column_name}_s3_path"
echo "Updating $column_name to: $perspective_s3_path"
update_query="UPDATE tblgames SET \"$column_name\" = '${perspective_s3_path//\'/\'\'}' WHERE game_id = $game_id;"
fi
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = public;
$update_query
EOF
then
echo "[DB ERROR] Failed to update path for perspective $PERSPECTIVE_NAME" >&2
return 1
fi
else
# 未指定视角,更新主 s3_path
echo "Updating main s3_path to: $full_s3_path"
if ! PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -v ON_ERROR_STOP=1 <<EOF
SET search_path = public;
UPDATE tblgames
SET s3_path = '${full_s3_path//\'/\'\'}'
WHERE game_id = $game_id;
EOF
then
echo "[DB ERROR] Failed to update s3_path" >&2
return 1
fi
fi
# 验证更新
echo "Verifying database update for game ID $game_id:"
if [ -n "$PERSPECTIVE_NAME" ]; then
if [ "$PERSPECTIVE_NAME" = "PGM" ]; then
# 显示 PGM 相关列
PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
SET search_path = public;
SELECT
game_id,
s3_path,
jsonb_typeof("offset") as offset_type,
"offset" as offset_content,
duration
FROM tblgames
WHERE game_id = $game_id;
EOF
else
# 显示特定视角栏(小写)
column_name="${perspective_column_name}_s3_path"
PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
SET search_path = public;
SELECT
game_id,
"$column_name",
jsonb_typeof("offset") as offset_type
FROM tblgames
WHERE game_id = $game_id;
EOF
fi
else
# 未指定视角时显示基本信息
PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" <<EOF
SET search_path = public;
SELECT
game_id,
s3_path,
jsonb_typeof("offset") as offset_type
FROM tblgames
WHERE game_id = $game_id;
EOF
fi
echo "Update completed successfully"
}
- 将文件名与锦标赛/游戏记录匹配
- 在成功处理后将 S3 路径更新回数据库
安全说明:避免在生产环境中对凭证进行硬编码。请使用环境变量或 AWS 参数存储。
4. 用于组织素材的视角模式
视频片段可能来自多个摄像机视角。为了在处理过程中正确组织这些媒体,脚本使用了映射机制:
declare -A PERSPECTIVE_PATTERNS=(
["Drone"]="Drone"
["Endzone 1"]="Endzone_1"
["Endzone 2"]="Endzone_2"
["PGM"]="PGM"
["High"]="high"
["Low"]="low"
["Medium"]="medium"
["Sideline"]="Sideline"
)
该关联数组可将文件名中的已知视频视角(如 Drone、Enddzone 1)映射到用于结构化输出和下游处理的标准化内部文件夹名称。
例如:
如果文件名包含"Drone",则处理后的输出将组织在下processed_clips/Drone。
5. 集中错误处理
一个小型实用程序可确保脚本在出错时优雅终止:
error_exit() {
echo "[ERROR] $1" >&2
exit 1
}
这样就很容易在脚本的所有部分中一致地处理故障:
ffmpeg ... || error_exit "FFmpeg failed during encoding"
重要性:如果没有集中处理,调试长时间运行的脚本中的故障将成为一场噩梦。这个简单的功能提高了可追溯性和弹性。
6. 依赖项检查和自动安装程序
在脚本执行任何繁重工作之前,它会验证所有关键工具是否可用:
# ----- 依赖项检查 -----
check_dependencies() {
local missing=()
for cmd in aws ffmpeg ffprobe psql; do
if ! command -v "$cmd" >/dev/null 2>&1; then
missing+=("$cmd")
fi
done
if [ ${#missing[@]} -gt 0 ]; then
echo "Installing missing dependencies: ${missing[*]}"
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS
if ! command -v brew >/dev/null; then
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc
fi
brew install "${missing[@]}" || error_exit "Failed to install dependencies"
else
# Linux
sudo apt-get update && sudo apt-get install -y "${missing[@]}" || {
error_exit "Failed to install dependencies"
}
fi
fi
}
如果缺少任何内容,脚本会自动安装它们:
- 在 macOS 上:如果需要,会安装 Homebrew,然后使用它来安装缺少的软件包。
- 在 Linux 上:使用
apt-get。
if [[ "$OSTYPE" == "darwin"* ]]; then
# 使用 Homebrew 安装 macOS
else
# 使用 apt-get 安装 Linux
fi
自我修复设置:此功能在新环境(CI/CD 运行器、EC2 实例、开发人员笔记本电脑)中特别有用,使脚本即插即用。
7. 数据库函数:将元数据与现实世界联系起来
为了确保每个处理过的视频片段都对应一个有效的锦标赛和比赛,脚本在继续执行之前会直接查询 PostgreSQL 数据库。这对于将正确的比赛 ID 与生成的缩略图、片段和元数据等资产关联起来至关重要。
query_tournament_id:按名称查找锦标赛
该函数根据锦标赛名称检索锦标赛的唯一 ID,并在所有后续操作中用作外键:
query_tournament_id() {
local tournament_name="$1"
echo "Querying tournament ID for: '$tournament_name'" >&2
result=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c \
"SELECT id FROM tbltrmnts WHERE trmnt_name = '$tournament_name';" | tr -d '[:space:]') || {
echo "[DB ERROR] Failed to query tournament ID" >&2
return 1
}
if [ -z "$result" ]; then
echo "[WARNING] Tournament not found: '$tournament_name'" >&2
return 1
fi
echo "$result"
}
目的:确保每个文件夹和文件名都正确地与经过验证的锦标赛相关联。
verify_game:处理前确保游戏存在
一旦锦标赛 ID 可用,脚本就会验证比赛(通过球队名称和日期标识)是否存在于数据库中:
verify_game() {
local tournament_id="$1"
local team1="$2"
local team2="$3"
local game_date="$4"
echo "Verifying game: $team1 vs $team2 on $game_date (Tournament ID: $tournament_id)" >&2
# 转换日期格式
if [[ "$OSTYPE" == "darwin"* ]]; then
# macOS 日期解析
formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d" 2>/dev/null) || {
echo "[ERROR] Invalid date format: $game_date" >&2
return 1
}
else
# Linux 日期解析
formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d 2>/dev/null) || {
echo "[ERROR] Invalid date format: $game_date" >&2
return 1
}
fi
query="SELECT g.* FROM tblgames AS g
LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
WHERE g.trmnt_id = $tournament_id
AND t1.team_name = '$team1' AND t2.team_name = '$team2'
AND DATE(g.match_date) = '$formatted_date';"
game_id=$(PGPASSWORD="$DB_PASS" psql -h "$DB_HOST" -U "$DB_USER" -d "$DB_NAME" -t -c "$query" | tr -d '[:space:]') || {
echo "[DB ERROR] Failed to query game ID" >&2
return 1
}
if [ -n "$game_id" ]; then
echo "Found game ID: $game_id" >&2
echo "$game_id"
return 0
else
echo "[WARNING] Game not found in database" >&2
return 1
fi
}
跨平台日期转换
由于 macOS 和 Linux 处理date格式的方式不同,因此该脚本同时处理这两种格式:
if [[ "$OSTYPE" == "darwin"* ]]; then
formatted_date=$(date -j -f "%B %d. %Y" "$(echo "$game_date" | sed 's/\.//')" "+%Y-%m-%d")
else
formatted_date=$(date -d "$(echo "$game_date" | sed 's/\.//')" +%Y-%m-%d)
fi
SQL JOIN 逻辑
SELECT g.* FROM tblgames AS g
LEFT JOIN tblteams AS t1 ON g.team1_id = t1.id
LEFT JOIN tblteams AS t2 ON g.team2_id = t2.id
WHERE g.trmnt_id = $tournament_id
AND t1.team_name = '$team1' AND t2.team_name = '$team2'
AND DATE(g.match_date) = '$formatted_date';
这种强大的查询结构可确保:
- 匹配到正确的锦标赛ID
- 两支球队均存在且名称匹配
- 比赛日期准确(即使格式不一致)
如果找到匹配的 game_id,则函数返回
if [ -n "$game_id" ]; then
echo "Found game ID: $game_id"
echo "$game_id"
else
echo "[WARNING] Game not found in database"
fi
注意:如果没有找到匹配项,脚本会记录警告并跳过对该文件夹的进一步处理,以防止意外的数据污染。
8. 文件处理功能:视频转换的核心
源视频经过验证后,脚本将进入重载阶段:将原始输入视频转换为 HLS 片段、MP4 片段、缩略图和元数据。此转换流程由两个函数驱动:process_full_video和process_clip。
process_full_video() {
local input_file="$1"
local output_dir="$2"
echo "Processing full video: $input_file" >&2
# 如果输出目录不存在,则创建它
mkdir -p "$output_dir" || {
echo "[ERROR] Failed to create output directory: $output_dir" >&2
return 1
}
# 创建片段目录
local segments_dir="${output_dir}/segments"
mkdir -p "$segments_dir" || {
echo "[ERROR] Failed to create segments directory: $segments_dir" >&2
return 1
}
# 获取 duration
local duration
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file" | awk '{print int($1)}') || {
echo "[ERROR] Failed to get duration for $input_file" >&2
return 1
}
# 为每种质量生成缩略图
generate_thumbnails "$input_file" "$output_dir/thumbnail"
# 处理每个质量级别
for quality in "${!QUALITIES[@]}"; do
local params="${QUALITIES[$quality]}"
echo "Processing $quality quality..." >&2
ffmpeg -i "$input_file" \
-vf "$(echo "$params" | grep -o 'scale=[^ ]*')" \
-c:v libx264 $(echo "$params" | grep -o '\-crf [^ ]*') \
-preset $(echo "$params" | grep -o '\-preset [^ ]*' | cut -d' ' -f2) \
-c:a aac $(echo "$params" | grep -o '\-b:a [^ ]*') \
-f hls -hls_time $SEGMENT_DURATION -hls_playlist_type vod \
-hls_segment_filename "$segments_dir/${quality}_%03d.ts" \
-hls_base_url "./segments/" \
"$output_dir/${quality}.m3u8" </dev/null || {
echo "[ERROR] Failed to process $quality version" >&2
continue
}
done
# 创建 metadata
echo "{\"duration\":$duration}" > "$output_dir/metadata.json"
}
关键操作
创建输出目录结构
- output_dir: 目标 HLS 目录
- segments_dir: 保存 .ts 片段的子目录
提取视频时长
duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$input_file" | awk '{print int($1)}') || {
echo "[ERROR] Failed to get duration for $input_file" >&2
return 1
}Uses ffprobe to calculate length in seconds
- 存储元数据中使用的持续时间
生成多质量 HLS 流循环
遍历low、med和high配置文件:
ffmpeg -i "$input_file" \
-vf "$(echo "$params" | grep -o 'scale=[^ ]*')" \
-c:v libx264 $(echo "$params" | grep -o '\-crf [^ ]*') \
-preset $(echo "$params" | grep -o '\-preset [^ ]*' | cut -d' ' -f2) \
-c:a aac $(echo "$params" | grep -o '\-b:a [^ ]*') \
-f hls -hls_time $SEGMENT_DURATION -hls_playlist_type vod \
-hls_segment_filename "$segments_dir/${quality}_%03d.ts" \
-hls_base_url "./segments/" \
"$output_dir/${quality}.m3u8" </dev/null || {
echo "[ERROR] Failed to process $quality version" >&2
continue
}
产出:
.m3u8播放列表文件.ts用于流式传输的分段文件
生成缩略图
# 为每种质量生成缩略图
generate_thumbnails "$input_file" "$output_dir/thumbnail"
generate_thumbnails() {
local input_file="$1"
local output_prefix="$2"
echo "Generating thumbnails for $input_file" >&2
for quality in "${!QUALITIES[@]}"; do
local scale=$(echo "${QUALITIES[$quality]}" | grep -o 'scale=[^ ]*')
local output_file="${output_prefix}_${quality}.jpg"
ffmpeg -ss "$THUMBNAIL_TIME" -i "$input_file" \
-vf "$scale" \
-vframes 1 \
-q:v 2 \
"$output_file" </dev/null || {
echo "[WARNING] Failed to generate $quality thumbnail" >&2
continue
}
done
}
# 创建 metadata
echo "{\"duration\":$duration}" > "$output_dir/metadata.json"
创建元数据
echo "{\"duration\": $duration }" > " $output_dir /metadata.json"
目的:通过提供多种分辨率和比特率以及精确的分段和干净的文件夹结构,实现自适应视频播放。
process_clip:使用动态质量分割集锦
用于从整场比赛中创建单独的集锦片段。
process_clip() {
local input_file="$1"
local clip_prefix="$2"
local start_sec="$3"
local duration_sec="$4"
echo "Processing clip $clip_prefix (Start: ${start_sec}s, Duration: ${duration_sec}s)" >&2
# 输出文件路径
local output_mp4="${PROCESSED_CLIPS_DIR}/${clip_prefix}.mp4"
local output_thumbnail="${PROCESSED_CLIPS_DIR}/${clip_prefix}.jpg"
# 清理所有现有文件
rm -f "$output_mp4" "$output_thumbnail"
# 根据开始时间计算可变质量的 CRF 值
local crf=$(( 22 + $(printf "%.0f" "$start_sec") % 6 ))
local resolution="-vf scale=trunc(iw/2)*2:trunc(ih/2)*2" # Ensure even dimensions
# 使用正确的编码和可变质量将剪辑处理为 MP4
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$start_sec" \
-i "$input_file" \
-t "$duration_sec" \
-c:v libx264 -crf "$crf" -preset fast \
-profile:v main -pix_fmt yuv420p \
-movflags +faststart \
-c:a aac -b:a 128k \
$resolution \
-y "$output_mp4" </dev/null || {
echo "[ERROR] FFmpeg failed to process clip $clip_prefix" >&2
return 1
}
# 验证输出文件
if [ ! -s "$output_mp4" ]; then
echo "[ERROR] Output file is empty: $output_mp4" >&2
return 1
fi
# 从剪辑中的几秒生成缩略图
local thumbnail_time=$(awk -v start="$start_sec" 'BEGIN { print (start + 5) }')
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$thumbnail_time" \
-i "$input_file" \
-t 1 \
-vf "scale=320:-1" \
-vframes 1 \
-q:v 2 \
-y "$output_thumbnail" </dev/null || {
echo "[WARNING] Failed to generate thumbnail for clip $clip_prefix at offset +5s, trying fallback..." >&2
# 回退:从开始时间生成
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$start_sec" \
-i "$input_file" \
-t 1 \
-vf "scale=320:-1" \
-vframes 1 \
-q:v 2 \
-y "$output_thumbnail" </dev/null || true
}
echo "✅ Finished processing clip $clip_prefix (CRF=$crf)" >&2
return 0
}
特点:
唯一输出路径
output_mp4 = “ ${PROCESSED_CLIPS_DIR} / ${clip_prefix} .mp4”
output_thumbnail = “ ${PROCESSED_CLIPS_DIR} / ${clip_prefix} .jpg”
用于自适应压缩的动态 CRF
local crf=$(( 22 + $( printf "%.0f" "$start_sec" ) % 6 ))
local resolution= "-vf scale=trunc(iw/2)*2:trunc(ih/2)*2" # 确保尺寸均匀
- CRF(恒定速率因子)随剪辑开始时间而增加
- 减小后续剪辑的文件大小
具有回退逻辑的剪辑编码
# 使用适当的编码和可变质量将剪辑处理为 MP4
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$start_sec" \
-i "$input_file" \
-t "$duration_sec" \
-c:v libx264 -crf "$crf" -preset fast \
-profile:v main -pix_fmt yuv420p \
-movflags +faststart \
-c:a aac -b:a 128k \
$resolution \
-y "$output_mp4" </dev/null || {
echo "[ERROR] FFmpeg failed to process clip $clip_prefix" >&2
return 1
}
- 从源头切开片段
- 应用均匀分辨率尺寸(
iw/2*2)以避免编解码器错误 - AAC 音频,
+faststart用于网络兼容性
缩略图生成
尝试在 start + 5s 处捕获缩略图,必要时回退到start。
local thumbnail_time=$(awk -v start="$start_sec" 'BEGIN { print (start + 5) }')
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$thumbnail_time" \
-i "$input_file" \
-t 1 \
-vf "scale=320:-1" \
-vframes 1 \
-q:v 2 \
-y "$output_thumbnail" </dev/null || {
echo "[WARNING] Failed to generate thumbnail for clip $clip_prefix at offset +5s, trying fallback..." >&2
# 回退:从开始时间生成
ffmpeg -hide_banner -loglevel error -nostdin \
-ss "$start_sec" \
-i "$input_file" \
-t 1 \
-vf "scale=320:-1" \
-vframes 1 \
-q:v 2 \
-y "$output_thumbnail" </dev/null || true
}
- 错误记录和验证
- 验证输出文件不为空
记录警告并在出现缩略图错误时正常继续
结果:每个剪辑都成为一个独立的、针对网络优化的 .mp4 文件,并附带 .jpg 缩略图 – 非常适合精彩片段、卷轴和事件回顾。
9. 主要处理函数 —process_single_video
此功能是处理每个视频文件的中央协调器。它提取元数据,对其进行适当的转换,并准备上传。
process_single_video() {
local MP4_FILE="$1"
local BASE_NAME="${MP4_FILE%.mp4}"
local LOCAL_MP4="${PROCESSING_DIR}/${MP4_FILE}"
local OUTPUT_GAME_DIR="${OUTPUT_DIR}/${BASE_NAME}"
local DESTINATION_FOLDER="$2" # Passed as parameter
echo -e "\n=== Processing video: $BASE_NAME ==="
# 1. 查找基本游戏名称(第三个连字符之前的所有内容)
local BASE_GAME_NAME
BASE_GAME_NAME=$(find_base_game_name "$MP4_FILE") || {
echo "[ERROR] Failed to determine base game name for $MP4_FILE" >&2
return 1
}
echo "Base game name: $BASE_GAME_NAME" >&2
# 2. 如果这是透视视频,则获取透视
local PERSPECTIVE_NAME=""
PERSPECTIVE_NAME=$(get_perspective_name "$MP4_FILE") || {
echo "Not a perspective video or perspective not recognized" >&2
}
# 3. 确定 S3 目标路径
local S3_DEST_PATH="${S3_DEST}${DESTINATION_FOLDER}/"
# 对于透视视频,添加透视子文件夹
if [ -n "$PERSPECTIVE_NAME" ]; then
S3_DEST_PATH="${S3_DEST_PATH}${PERSPECTIVE_PATTERNS[$PERSPECTIVE_NAME]}/"
fi
# [其他处理步骤...]
}
分步分解
1. 提取基础游戏名称
BASE_GAME_NAME=$(find_base_game_name " $MP4_FILE " )
find_base_game_name () {
local filename= " $1 "
echo "Finding base game name for: $filename " >&2
# 使用连字符分隔符将文件名拆分为几部分
IFS= '-' read -ra parts <<< " $filename "
# 取前 3 个部分(第三个连字符之前)并将它们连接起来
base_name=$(IFS= '-' ; echo " ${parts[*]:0:3} " | xargs)
base_name= " ${base_name%.mp4} " # 如果存在则删除 .mp4
echo " $base_name "
}
2. 识别摄像机视角
PERSPECTIVE_NAME=$(get_perspective_name "$MP4_FILE")
get_perspective_name() {
local filename="$1"
echo "Getting perspective name for: $filename" >&2
# 使用连字符分隔符将文件名拆分为几部分
IFS='-' read -ra parts <<< "$filename"
# 如果有超过 3 个部分,则透视图位于第三个连字符之后
if [ ${#parts[@]} -ge 4 ]; then
perspective=$(IFS='-'; echo "${parts[*]:3}" | xargs)
perspective="${perspective%.mp4}" # Remove .mp4 if present
# 在透视图模式中寻找匹配的模式
for pattern in "${!PERSPECTIVE_PATTERNS[@]}"; do
if [[ "$perspective" == *"$pattern"* ]]; then
echo "$pattern"
return 0
fi
done
# 如果没有完全匹配,则返回原始视角文本
echo "$perspective"
return 0
fi
# 未找到视角
echo ""
return 1
}
3. 设置目标 S3 路径
S3_DEST_PATH = “ ${S3_DEST} ${DESTINATION_FOLDER} /”
- 将全局目标根目录 (S3_DEST) 与提取的
DESTINATION_FOLDER相结合。 - 如果视频是透视图,则添加子文件夹。
4. 协调管道
在初始元数据提取后,该功能(完整形式)将继续进行以下工作:
使用以下方法对数据库进行验证
query_tournament_idverify_game
传输文件:从输入到PROCESSING_DIR
运行转换:
process_full_video用于 HLS 转换process_clip重点推荐(如适用)
上传到 S3:将处理过的文件推送到正确的 S3 目的地
记录元数据:将持续时间和标识符写入metadata.json
清理:删除临时文件和目录以释放空间
10. 初始化和主要执行
本节定义了整个脚本的入口点。它准备环境、验证需求,然后处理来自 S3 的每个视频源。
初始化:init()
init() {
# 创建目录
mkdir -p "$BASE_DIR" "$INPUT_DIR" "$PROCESSING_DIR" "$OUTPUT_DIR" "$PROCESSED_CLIPS_DIR"
# 初始化日志记录
exec > >(tee -a "$LOG_FILE") 2>&1
echo -e "\n=== Script started at $(date) ==="
# 检查依赖项
check_dependencies
# 验证 AWS 凭证
aws sts get-caller-identity >/dev/null || {
error_exit "AWS credentials not configured properly"
}
}
主要功能:main()
# ----- 主要执行 -----
main() {
init
# 按顺序处理每个 S3 源
for S3_SOURCE in "${S3_SOURCES[@]}"; do
process_source "$S3_SOURCE" || {
echo "[WARNING] Failed to process source: $S3_SOURCE"
continue
}
done
echo -e "\n=== Script completed at $(date) ==="
echo "Log saved to $LOG_FILE"
}
这里会发生什么:
- 初始化所有内容(init)
- 遍历 S3_SOURCES 数组,其中可能包括多个锦标赛或游戏
- 为每个路径调用 process_source:
- 下载文件
- 解析名称和元数据
- 处理为 HLS 和剪辑
- 上传到 S3
- 优雅地处理失败 – 记录警告,但继续下一个源
- 记录结束时间,以便审计和诊断
开始执行
main
结尾处这行简单而有力的文字触发了整个工作流程。
总结
init 和 main 函数将脚本整合成一个简洁、模块化和容错的流水线。这种结构:
- 便于调试
- 支持未来扩展(如并行处理)
- 为日常自动化任务的生产做好准备
作者:Syed Muhammad Ali
本文来自作者投稿,版权归原作者所有。如需转载,请注明出处:https://www.nxrte.com/jishu/60403.html