feat: 新增 Docker 部署支持、Swoole/Octane 集成及相关优化
- 添加 Dockerfile 与多套 docker-compose 配置(开发/生产环境) - 集成 Laravel Octane (Swoole) 提升性能 - 新增健康检查、监控脚本及部署文档 - 新增 Docker 镜像离线导入包(MySQL/Redis/Meilisearch) - 优化文档转换、预览服务及队列任务 - 添加 CreateAdminUser 命令与路由健康检查接口 - 新增 Swoole 队列兼容性测试套件 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
364
docker/HEALTH_MONITORING.md
Normal file
364
docker/HEALTH_MONITORING.md
Normal file
@@ -0,0 +1,364 @@
|
||||
# Docker健康检查和监控指南
|
||||
|
||||
本文档描述了Laravel知识库系统的Docker健康检查和自动重启机制的配置和使用方法。
|
||||
|
||||
## 概述
|
||||
|
||||
系统实现了完整的健康检查和自动重启机制,包括:
|
||||
|
||||
- **Web应用HTTP健康检查** - 检查应用程序和依赖服务状态
|
||||
- **数据库连接健康检查** - 验证MySQL数据库连接
|
||||
- **Redis连接健康检查** - 验证Redis缓存服务连接
|
||||
- **Meilisearch API健康检查** - 验证搜索引擎服务状态
|
||||
- **容器自动重启策略** - 在服务失败时自动恢复
|
||||
- **持续监控系统** - 主动监控和故障处理
|
||||
|
||||
## 健康检查配置
|
||||
|
||||
### 1. Web应用健康检查
|
||||
|
||||
**端点**: `GET /health`
|
||||
|
||||
**检查项目**:
|
||||
- 数据库连接状态
|
||||
- Redis缓存连接状态
|
||||
- Meilisearch搜索引擎连接状态
|
||||
- 存储目录可写性
|
||||
|
||||
**响应格式**:
|
||||
```json
|
||||
{
|
||||
"status": "ok|degraded",
|
||||
"timestamp": "2024-12-24T10:30:00.000000Z",
|
||||
"services": {
|
||||
"database": "connected|disconnected",
|
||||
"redis": "connected|disconnected|not_configured",
|
||||
"meilisearch": "connected|disconnected|not_configured",
|
||||
"storage": "writable|not_writable"
|
||||
},
|
||||
"version": "1.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
**Docker配置**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 60s
|
||||
```
|
||||
|
||||
### 2. MySQL数据库健康检查
|
||||
|
||||
**检查方法**: `mysqladmin ping`
|
||||
|
||||
**Docker配置**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "mysqladmin", "ping", "-h", "localhost", "-u", "root", "-p${DB_PASSWORD}"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### 3. Redis缓存健康检查
|
||||
|
||||
**检查方法**: `redis-cli ping`
|
||||
|
||||
**Docker配置**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "redis-cli", "ping"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 10s
|
||||
```
|
||||
|
||||
### 4. Meilisearch搜索引擎健康检查
|
||||
|
||||
**检查方法**: `curl -f http://localhost:7700/health`
|
||||
|
||||
**Docker配置**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:7700/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
### 5. 队列处理器健康检查
|
||||
|
||||
**检查方法**: 自定义脚本检查队列进程和依赖服务
|
||||
|
||||
**脚本位置**: `/usr/local/bin/queue-health-check.sh`
|
||||
|
||||
**Docker配置**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "/usr/local/bin/queue-health-check.sh"]
|
||||
interval: 60s
|
||||
timeout: 30s
|
||||
retries: 3
|
||||
start_period: 30s
|
||||
```
|
||||
|
||||
## 自动重启策略
|
||||
|
||||
所有服务都配置了 `restart: unless-stopped` 策略:
|
||||
|
||||
- **自动重启**: 容器异常退出时自动重启
|
||||
- **手动停止**: 手动停止的容器不会自动重启
|
||||
- **系统重启**: 系统重启后自动启动容器(除非手动停止)
|
||||
|
||||
## 监控系统
|
||||
|
||||
### 持续监控脚本
|
||||
|
||||
**脚本**: `docker/monitor-services.sh`
|
||||
|
||||
**功能**:
|
||||
- 持续监控所有服务的健康状态
|
||||
- 自动重启不健康的容器
|
||||
- 限制重启次数防止无限重启
|
||||
- 记录详细的监控日志
|
||||
- 发送告警通知
|
||||
|
||||
**配置参数**:
|
||||
- `MONITOR_INTERVAL`: 监控间隔(默认60秒)
|
||||
- `MAX_RESTART_ATTEMPTS`: 最大重启尝试次数(默认3次)
|
||||
- `RESTART_COOLDOWN`: 重启冷却时间(默认300秒)
|
||||
- `LOG_FILE`: 日志文件路径
|
||||
|
||||
### 服务状态检查脚本
|
||||
|
||||
**脚本**: `docker/check-services.sh`
|
||||
|
||||
**功能**:
|
||||
- 一次性检查所有服务状态
|
||||
- 详细的健康状态报告
|
||||
- 连接测试和故障诊断
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 1. 启动服务和监控
|
||||
|
||||
```bash
|
||||
# 完整启动(包含监控)
|
||||
./docker/start-with-monitoring.sh
|
||||
|
||||
# 启动服务但不启动监控
|
||||
./docker/start-with-monitoring.sh --no-monitor
|
||||
|
||||
# 跳过镜像构建
|
||||
./docker/start-with-monitoring.sh --skip-build
|
||||
|
||||
# 跳过服务就绪等待
|
||||
./docker/start-with-monitoring.sh --skip-wait
|
||||
```
|
||||
|
||||
### 2. 检查服务状态
|
||||
|
||||
```bash
|
||||
# 运行完整的健康检查
|
||||
./docker/check-services.sh
|
||||
|
||||
# 查看容器状态
|
||||
docker-compose ps
|
||||
|
||||
# 查看容器健康状态
|
||||
docker inspect --format='{{.State.Health.Status}}' knowledge_base_app
|
||||
```
|
||||
|
||||
### 3. 查看监控日志
|
||||
|
||||
```bash
|
||||
# 查看监控日志
|
||||
tail -f ./storage/logs/monitor.log
|
||||
|
||||
# 查看监控输出
|
||||
tail -f ./storage/logs/monitor-output.log
|
||||
|
||||
# 查看容器日志
|
||||
docker-compose logs -f app
|
||||
docker-compose logs -f queue
|
||||
```
|
||||
|
||||
### 4. 停止监控
|
||||
|
||||
```bash
|
||||
# 只停止监控进程
|
||||
./docker/stop-monitoring.sh
|
||||
|
||||
# 停止监控和服务
|
||||
./docker/stop-monitoring.sh --stop-services
|
||||
|
||||
# 停止监控、服务并清理日志
|
||||
./docker/stop-monitoring.sh --all
|
||||
```
|
||||
|
||||
### 5. 手动重启服务
|
||||
|
||||
```bash
|
||||
# 重启单个服务
|
||||
docker-compose restart app
|
||||
|
||||
# 重启所有服务
|
||||
docker-compose restart
|
||||
|
||||
# 重新构建并启动
|
||||
docker-compose up -d --build
|
||||
```
|
||||
|
||||
## 监控指标
|
||||
|
||||
### 健康检查状态
|
||||
|
||||
- `healthy`: 服务正常运行
|
||||
- `unhealthy`: 服务健康检查失败
|
||||
- `starting`: 服务正在启动
|
||||
- `no-healthcheck`: 服务未配置健康检查
|
||||
|
||||
### 监控日志格式
|
||||
|
||||
```
|
||||
2024-12-24 10:30:00 [INFO] 开始监控检查 (共5个服务)
|
||||
2024-12-24 10:30:01 [SUCCESS] MySQL数据库容器健康状态正常
|
||||
2024-12-24 10:30:02 [SUCCESS] Redis缓存容器健康状态正常
|
||||
2024-12-24 10:30:03 [SUCCESS] Meilisearch搜索容器健康状态正常
|
||||
2024-12-24 10:30:04 [SUCCESS] Web应用容器健康状态正常
|
||||
2024-12-24 10:30:05 [SUCCESS] 队列处理器容器健康状态正常
|
||||
2024-12-24 10:30:06 [SUCCESS] 所有服务运行正常
|
||||
```
|
||||
|
||||
### 重启计数器
|
||||
|
||||
监控系统维护每个容器的重启计数器:
|
||||
|
||||
- 位置: `./storage/logs/restart_counters/`
|
||||
- 格式: `{container_name}.count`
|
||||
- 重置: 容器健康时自动重置
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 常见问题
|
||||
|
||||
1. **健康检查失败**
|
||||
```bash
|
||||
# 检查容器日志
|
||||
docker-compose logs app
|
||||
|
||||
# 手动测试健康检查端点
|
||||
curl -v http://localhost/health
|
||||
```
|
||||
|
||||
2. **监控进程无法启动**
|
||||
```bash
|
||||
# 检查权限
|
||||
ls -la docker/monitor-services.sh
|
||||
|
||||
# 手动运行监控脚本
|
||||
./docker/monitor-services.sh
|
||||
```
|
||||
|
||||
3. **容器重启循环**
|
||||
```bash
|
||||
# 查看重启计数器
|
||||
cat ./storage/logs/restart_counters/knowledge_base_app.count
|
||||
|
||||
# 重置重启计数器
|
||||
echo "0" > ./storage/logs/restart_counters/knowledge_base_app.count
|
||||
```
|
||||
|
||||
4. **存储权限问题**
|
||||
```bash
|
||||
# 修复存储目录权限
|
||||
sudo chown -R $(id -u):$(id -g) ./storage
|
||||
chmod -R 755 ./storage
|
||||
```
|
||||
|
||||
### 调试模式
|
||||
|
||||
启用详细日志记录:
|
||||
|
||||
```bash
|
||||
# 设置环境变量
|
||||
export LOG_LEVEL=debug
|
||||
|
||||
# 运行监控脚本
|
||||
./docker/monitor-services.sh --interval 30
|
||||
```
|
||||
|
||||
## 生产环境建议
|
||||
|
||||
1. **监控配置**
|
||||
- 设置适当的监控间隔(建议60-120秒)
|
||||
- 配置告警通知(邮件、Slack等)
|
||||
- 定期检查监控日志
|
||||
|
||||
2. **资源限制**
|
||||
- 为容器设置内存和CPU限制
|
||||
- 监控系统资源使用情况
|
||||
- 配置日志轮转
|
||||
|
||||
3. **备份策略**
|
||||
- 定期备份数据库和搜索索引
|
||||
- 备份应用配置和上传文件
|
||||
- 测试恢复流程
|
||||
|
||||
4. **安全考虑**
|
||||
- 限制健康检查端点的访问
|
||||
- 使用强密码和密钥
|
||||
- 定期更新容器镜像
|
||||
|
||||
## 扩展功能
|
||||
|
||||
### 自定义告警
|
||||
|
||||
在 `monitor-services.sh` 中的 `send_alert` 函数中添加自定义告警逻辑:
|
||||
|
||||
```bash
|
||||
send_alert() {
|
||||
local message=$1
|
||||
local severity=$2
|
||||
|
||||
# 发送邮件告警
|
||||
echo "$message" | mail -s "Docker监控告警" admin@example.com
|
||||
|
||||
# 发送到Slack
|
||||
curl -X POST -H 'Content-type: application/json' \
|
||||
--data "{\"text\":\"$message\"}" \
|
||||
"$SLACK_WEBHOOK_URL"
|
||||
}
|
||||
```
|
||||
|
||||
### 集成外部监控
|
||||
|
||||
可以将健康检查数据发送到外部监控系统:
|
||||
|
||||
- Prometheus + Grafana
|
||||
- Zabbix
|
||||
- Nagios
|
||||
- DataDog
|
||||
|
||||
### 自动扩缩容
|
||||
|
||||
基于健康检查结果实现自动扩缩容:
|
||||
|
||||
```bash
|
||||
# 检查负载并调整副本数
|
||||
if [ $cpu_usage -gt 80 ]; then
|
||||
docker-compose up -d --scale app=3
|
||||
fi
|
||||
```
|
||||
|
||||
## 总结
|
||||
|
||||
本系统提供了完整的健康检查和自动重启机制,确保服务的高可用性。通过合理配置和使用这些工具,可以大大提高系统的稳定性和可靠性。
|
||||
|
||||
定期检查监控日志,及时处理告警,并根据实际情况调整配置参数,是维护系统健康运行的关键。
|
||||
Reference in New Issue
Block a user