8b2590c5b5
- 完成Phase 11: predictV3算法优化研究文档,涵盖6个优化方向的技术分析 - 实现置信度评估功能,提供历史命中率、得分分布、多维度一致性置信度指标 - 扩展回测指标体系,新增NDCG@K、MRR、命中率分布等排名质量评估指标 - 优化转移概率算法,引入二阶马尔可夫链和多属性联合转移增强预测准确性 - 设计权重训练机制,支持网格搜索和遗传算法进行数据驱动的参数优化 - 集成组合特征挖掘功能,采用关联规则和序列模式发现号码间潜在关联 - 实现完整的前端交互界面,支持预测结果显示、置信度展示和回测验证功能 - 建立性能优化策略,包括预计算缓存、批量计算和降级策略保障响应速度
455 lines
16 KiB
Markdown
455 lines
16 KiB
Markdown
---
|
||
phase: 11-predictv3
|
||
plan: 05
|
||
type: execute
|
||
wave: 1
|
||
depends_on: []
|
||
files_modified:
|
||
- application/admin/model/History.php
|
||
autonomous: true
|
||
requirements:
|
||
- PRED-03
|
||
must_haves:
|
||
truths:
|
||
- "转移概率计算考虑前两期状态联合决定"
|
||
- "系统在数据充足时使用二阶马尔可夫,数据不足时回退一阶"
|
||
- "预测结果中显示使用的转移概率阶数"
|
||
- "二阶马尔可夫有状态对观察次数检查,不足时回退一阶"
|
||
artifacts:
|
||
- path: "application/admin/model/History.php"
|
||
provides: "二阶马尔可夫转移矩阵构建方法"
|
||
contains: "_getTransitionMatrix2ndOrder|_calcTransitionScore2ndOrder"
|
||
key_links:
|
||
- from: "getPredictionV3"
|
||
to: "_getTransitionMatrix2ndOrder"
|
||
via: "conditional call based on data availability and state pair count"
|
||
---
|
||
|
||
# Phase 11 - Plan 05: 二阶马尔可夫转移概率增强
|
||
|
||
## Objective
|
||
|
||
改进现有一阶马尔可夫链转移概率计算,新增二阶马尔可夫链实现。考虑前两期状态联合决定当前转移概率,提升转移概率预测准确性。
|
||
|
||
**Purpose:** 现有 `_getTransitionMatrix` 仅考虑上一期状态,预测信息有限。二阶马尔可夫链利用更长历史序列,理论上预测更精准。
|
||
|
||
**Important:** 本计划是独立功能增强,不依赖其他计划。可独立执行。
|
||
|
||
**Output:** `History.php` 新增 `_getTransitionMatrix2ndOrder` 和 `_calcTransitionScore2ndOrder` 方法,`getPredictionV3` 中根据数据量和状态对观察次数选择使用一阶或二阶转移概率。
|
||
|
||
## Tasks
|
||
|
||
### Task 1: 实现二阶马尔可夫转移矩阵构建方法(含状态对观察次数检查)
|
||
|
||
<read_first>
|
||
- D:\code\php\amlhc\application\admin\model\History.php (line 2468-2493, _getTransitionMatrix 方法)
|
||
- D:\code\php\amlhc\application\admin\model\History.php (line 2452-2460, _getHeadIdx 方法)
|
||
</read_first>
|
||
|
||
<action>
|
||
在 `History.php` 类末尾新增二阶马尔可夫转移矩阵构建方法:
|
||
|
||
```php
|
||
/**
|
||
* 构建二阶马尔可夫转移矩阵
|
||
* 考虑前两期状态联合决定当前转移概率
|
||
*
|
||
* 状态空间说明:
|
||
* - 一阶马尔可夫: N个状态 (zone:5, tail:10, head:5)
|
||
* - 二阶马尔可夫: N^2个状态对 (zone:25, tail:100, head:25)
|
||
* - 状态键格式: "prev1-prev2",如 "2-3" 表示前一期区域2、前两期区域3
|
||
*
|
||
* 数据量阈值说明:
|
||
* - 建议历史数据 >= 200期以获得稳定的二阶概率估计
|
||
* - 状态对观察次数 >= 5 才使用该状态对的二阶概率
|
||
* - 观察次数不足时返回 state_pair_insufficient 标志,供调用者回退一阶
|
||
*
|
||
* @param array $history 历史数据(降序,最新在前)
|
||
* @param string $type 类型:zone/tail/head
|
||
* @param int $minStatePairCount 状态对最小观察次数,默认5
|
||
* @return array {matrix: [], prob_matrix: [], state_totals: [], num_categories: int, sufficient_pairs: int, total_pairs: int, min_threshold: int}
|
||
*/
|
||
private function _getTransitionMatrix2ndOrder($history, $type, $minStatePairCount = 5)
|
||
{
|
||
// 升序排列(从旧到新)
|
||
$historyAsc = array_reverse($history);
|
||
|
||
// 确定类别数量和索引函数
|
||
switch ($type) {
|
||
case 'zone':
|
||
$numCategories = 5;
|
||
$getIdx = function ($num) {
|
||
if ($num <= 10) return 0;
|
||
if ($num <= 20) return 1;
|
||
if ($num <= 30) return 2;
|
||
if ($num <= 40) return 3;
|
||
return 4;
|
||
};
|
||
break;
|
||
case 'tail':
|
||
$numCategories = 10;
|
||
$getIdx = function ($num) { return $num % 10; };
|
||
break;
|
||
case 'head':
|
||
$numCategories = 5;
|
||
$getIdx = function ($num) {
|
||
if ($num <= 9) return 0;
|
||
if ($num <= 19) return 1;
|
||
if ($num <= 29) return 2;
|
||
if ($num <= 39) return 3;
|
||
return 4;
|
||
};
|
||
break;
|
||
default:
|
||
return [
|
||
'matrix' => [],
|
||
'prob_matrix' => [],
|
||
'state_totals' => [],
|
||
'num_categories' => 0,
|
||
'sufficient_pairs' => 0,
|
||
'total_pairs' => 0,
|
||
'min_threshold' => $minStatePairCount
|
||
];
|
||
}
|
||
|
||
// 状态空间: (prev1, prev2) -> current,共 numCategories^2 个前置状态
|
||
$matrix = [];
|
||
$stateTotals = [];
|
||
|
||
// 初始化矩阵结构
|
||
for ($i = 0; $i < $numCategories; $i++) {
|
||
for ($j = 0; $j < $numCategories; $j++) {
|
||
$stateKey = $i . '-' . $j;
|
||
$matrix[$stateKey] = array_fill(0, $numCategories, 0);
|
||
$stateTotals[$stateKey] = 0;
|
||
}
|
||
}
|
||
|
||
// 统计二阶转移
|
||
for ($i = 0; $i < count($historyAsc) - 2; $i++) {
|
||
$prev1 = $getIdx((int)$historyAsc[$i]['num7']);
|
||
$prev2 = $getIdx((int)$historyAsc[$i + 1]['num7']);
|
||
$current = $getIdx((int)$historyAsc[$i + 2]['num7']);
|
||
|
||
if ($prev1 < 0 || $prev2 < 0 || $current < 0) continue;
|
||
|
||
$stateKey = $prev1 . '-' . $prev2;
|
||
$matrix[$stateKey][$current]++;
|
||
$stateTotals[$stateKey]++;
|
||
}
|
||
|
||
// 统计充分观察的状态对数量(观察次数 >= minStatePairCount)
|
||
$sufficientPairs = 0;
|
||
$totalPairs = $numCategories * $numCategories;
|
||
foreach ($stateTotals as $stateKey => $count) {
|
||
if ($count >= $minStatePairCount) {
|
||
$sufficientPairs++;
|
||
}
|
||
}
|
||
|
||
// 拉普拉斯平滑处理
|
||
$probMatrix = [];
|
||
foreach ($matrix as $stateKey => $counts) {
|
||
$smoothTotal = $stateTotals[$stateKey] + $numCategories;
|
||
$probMatrix[$stateKey] = [];
|
||
for ($j = 0; $j < $numCategories; $j++) {
|
||
$probMatrix[$stateKey][$j] = ($counts[$j] + 1) / $smoothTotal;
|
||
}
|
||
}
|
||
|
||
return [
|
||
'matrix' => $matrix,
|
||
'prob_matrix' => $probMatrix,
|
||
'state_totals' => $stateTotals,
|
||
'num_categories' => $numCategories,
|
||
'sufficient_pairs' => $sufficientPairs,
|
||
'total_pairs' => $totalPairs,
|
||
'min_threshold' => $minStatePairCount
|
||
];
|
||
}
|
||
```
|
||
|
||
实现要点:
|
||
- 状态空间从 N 扩展到 N^2(zone: 25状态,tail: 100状态,head: 25状态)
|
||
- 使用拉普拉斯平滑处理避免零概率问题
|
||
- 状态键格式为 "prev1-prev2"
|
||
- **新增状态对观察次数检查**:统计 sufficient_pairs(观察>=5次的状态对数量)
|
||
- 返回 sufficient_pairs、total_pairs、min_threshold 供调用者判断是否足够稳定
|
||
</action>
|
||
|
||
<acceptance_criteria>
|
||
- grep 正则匹配: `_getTransitionMatrix2ndOrder\s*\(` 在 History.php 中存在
|
||
- grep 匹配: `$minStatePairCount` 参数在方法签名中存在
|
||
- grep 匹配: `sufficient_pairs` 在返回结构中存在
|
||
- grep 匹配: `total_pairs` 在返回结构中存在
|
||
- 方法包含 stateKey 变量(格式为 prev1-prev2)
|
||
- 方法包含函数级注释,说明状态空间和数据量阈值
|
||
</acceptance_criteria>
|
||
|
||
### Task 2: 实现二阶转移概率得分计算方法
|
||
|
||
<read_first>
|
||
- D:\code\php\amlhc\application\admin\model\History.php (新增的 _getTransitionMatrix2ndOrder 方法)
|
||
- D:\code\php\amlhc\application\admin\model\History.php (查找 _calcTransitionScore 方法位置)
|
||
</read_first>
|
||
|
||
<action>
|
||
使用 Grep 找到 `_calcTransitionScore` 方法位置后,在其附近新增二阶转移概率得分计算方法:
|
||
|
||
```bash
|
||
grep -n "_calcTransitionScore" application/admin/model/History.php
|
||
```
|
||
|
||
新增 `_calcTransitionScore2ndOrder` 方法:
|
||
|
||
```php
|
||
/**
|
||
* 计算二阶转移概率得分
|
||
*
|
||
* 计算方法:
|
||
* - 综合区域、尾号、首号三个维度的二阶转移概率
|
||
* - 各维度权重: 区域40%、尾号35%、首号25%
|
||
* - 得分范围: 0-100
|
||
*
|
||
* @param int $num 当前号码
|
||
* @param int $prev1Zone 前一期区域索引
|
||
* @param int $prev2Zone 前两期区域索引
|
||
* @param int $prev1Tail 前一期尾号索引
|
||
* @param int $prev2Tail 前两期尾号索引
|
||
* @param int $prev1Head 前一期首号索引
|
||
* @param int $prev2Head 前两期首号索引
|
||
* @param array $zoneTrans2nd 二阶区域转移矩阵
|
||
* @param array $tailTrans2nd 二阶尾号转移矩阵
|
||
* @param array $headTrans2nd 二阶首号转移矩阵
|
||
* @param array $zoneMap 号码区域映射
|
||
* @param array $tailMap 号码尾号映射
|
||
* @param array $headMap 号码首号映射
|
||
* @return float 综合转移得分 (0-100)
|
||
*/
|
||
private function _calcTransitionScore2ndOrder(
|
||
$num,
|
||
$prev1Zone, $prev2Zone,
|
||
$prev1Tail, $prev2Tail,
|
||
$prev1Head, $prev2Head,
|
||
$zoneTrans2nd, $tailTrans2nd, $headTrans2nd,
|
||
$zoneMap, $tailMap, $headMap
|
||
)
|
||
{
|
||
$zone = $zoneMap[$num];
|
||
$tail = $tailMap[$num];
|
||
$head = $headMap[$num];
|
||
|
||
$score = 0;
|
||
|
||
// 区域二阶转移得分(权重40%)
|
||
$zoneStateKey = $prev1Zone . '-' . $prev2Zone;
|
||
if (isset($zoneTrans2nd['prob_matrix'][$zoneStateKey][$zone])) {
|
||
$prob = $zoneTrans2nd['prob_matrix'][$zoneStateKey][$zone];
|
||
$score += $prob * 40;
|
||
}
|
||
|
||
// 尾号二阶转移得分(权重35%)
|
||
$tailStateKey = $prev1Tail . '-' . $prev2Tail;
|
||
if (isset($tailTrans2nd['prob_matrix'][$tailStateKey][$tail])) {
|
||
$prob = $tailTrans2nd['prob_matrix'][$tailStateKey][$tail];
|
||
$score += $prob * 35;
|
||
}
|
||
|
||
// 首号二阶转移得分(权重25%)
|
||
$headStateKey = $prev1Head . '-' . $prev2Head;
|
||
if (isset($headTrans2nd['prob_matrix'][$headStateKey][$head])) {
|
||
$prob = $headTrans2nd['prob_matrix'][$headStateKey][$head];
|
||
$score += $prob * 25;
|
||
}
|
||
|
||
return round($score, 2);
|
||
}
|
||
```
|
||
|
||
实现要点:
|
||
- 综合区域、尾号、首号三个维度
|
||
- 各维度权重:区域40%、尾号35%、首号25%
|
||
- 使用 prob_matrix 中对应状态键的概率值
|
||
</action>
|
||
|
||
<acceptance_criteria>
|
||
- grep 正则匹配: `_calcTransitionScore2ndOrder\s*\(` 在 History.php 中存在
|
||
- 方法参数包含 prev1Zone、prev2Zone 等二阶状态参数
|
||
- 方法包含 zoneStateKey、tailStateKey、headStateKey 变量
|
||
- 方法包含函数级注释说明权重分配
|
||
</acceptance_criteria>
|
||
|
||
### Task 3: 在 getPredictionV3 中集成二阶马尔可夫(含200期阈值和状态对检查)
|
||
|
||
<read_first>
|
||
- D:\code\php\amlhc\application\admin\model\History.php (line 2230-2239, 转移概率分析部分)
|
||
- D:\code\php\amlhc\application\admin\model\History.php (line 2159-2161, 历史数据量检查)
|
||
</read_first>
|
||
|
||
<action>
|
||
在 `getPredictionV3` 方法中修改转移概率分析部分(约 line 2230-2239):
|
||
|
||
1. 找到以下代码段:
|
||
```php
|
||
// ====== 3. 转移概率分析(新增)======
|
||
// 获取转移概率矩阵数据
|
||
$zoneTransition = $this->_getTransitionMatrix($allHistory, 'zone');
|
||
$tailTransition = $this->_getTransitionMatrix($allHistory, 'tail');
|
||
$headTransition = $this->_getTransitionMatrix($allHistory, 'head');
|
||
|
||
// 上期号码的各类属性
|
||
$lastZone = $this->_getZoneIdx($lastSpecial);
|
||
$lastTail = $lastSpecial % 10;
|
||
$lastHead = $this->_getHeadIdx($lastSpecial);
|
||
```
|
||
|
||
替换为:
|
||
```php
|
||
// ====== 3. 转移概率分析 ======
|
||
// 根据历史数据量决定使用一阶或二阶马尔可夫
|
||
// 阈值条件:总期数 >= 200 且 状态对观察次数充足(>=5次的比例>=30%)
|
||
$minPeriodsThreshold = 200; // 二阶马尔可夫最小历史期数阈值(从100提升到200)
|
||
$minStatePairCount = 5; // 状态对最小观察次数
|
||
$use2ndOrder = false;
|
||
$secondOrderAvailable = false;
|
||
|
||
// 获取一阶转移概率矩阵(始终计算,作为fallback)
|
||
$zoneTransition = $this->_getTransitionMatrix($allHistory, 'zone');
|
||
$tailTransition = $this->_getTransitionMatrix($allHistory, 'tail');
|
||
$headTransition = $this->_getTransitionMatrix($allHistory, 'head');
|
||
|
||
// 获取二阶转移概率矩阵(数据充足时)
|
||
$zoneTransition2nd = null;
|
||
$tailTransition2nd = null;
|
||
$headTransition2nd = null;
|
||
$prev2Zone = 0;
|
||
$prev2Tail = 0;
|
||
$prev2Head = 0;
|
||
|
||
if (count($allHistory) >= $minPeriodsThreshold && count($allHistory) >= 2) {
|
||
// 获取前两期号码属性
|
||
$prev2Special = (int)$allHistory[1]['num7'];
|
||
$prev2Zone = $this->_getZoneIdx($prev2Special);
|
||
$prev2Tail = $prev2Special % 10;
|
||
$prev2Head = $this->_getHeadIdx($prev2Special);
|
||
|
||
// 构建二阶转移矩阵
|
||
$zoneTransition2nd = $this->_getTransitionMatrix2ndOrder($allHistory, 'zone', $minStatePairCount);
|
||
$tailTransition2nd = $this->_getTransitionMatrix2ndOrder($allHistory, 'tail', $minStatePairCount);
|
||
$headTransition2nd = $this->_getTransitionMatrix2ndOrder($allHistory, 'head', $minStatePairCount);
|
||
|
||
// 检查状态对观察次数是否充足(至少30%的状态对有足够观察)
|
||
// tail类型状态空间最大(100),以tail为基准判断
|
||
if ($tailTransition2nd['total_pairs'] > 0) {
|
||
$sufficientRatio = $tailTransition2nd['sufficient_pairs'] / $tailTransition2nd['total_pairs'];
|
||
$secondOrderAvailable = $sufficientRatio >= 0.3; // 至少30%状态对观察>=5次
|
||
}
|
||
|
||
$use2ndOrder = $secondOrderAvailable;
|
||
}
|
||
|
||
// 上期号码的各类属性
|
||
$lastZone = $this->_getZoneIdx($lastSpecial);
|
||
$lastTail = $lastSpecial % 10;
|
||
$lastHead = $this->_getHeadIdx($lastSpecial);
|
||
```
|
||
|
||
2. 在 analysis 数组中添加转移阶数信息(约 line 2297-2317):
|
||
|
||
找到:
|
||
```php
|
||
$analysis = [
|
||
'last_special' => $lastSpecial,
|
||
'last_expect' => $lastExpect,
|
||
'weights' => $weights,
|
||
...
|
||
];
|
||
```
|
||
|
||
在 `trend_direction` 后添加:
|
||
```php
|
||
$analysis = [
|
||
...
|
||
'trend_direction' => $trendDirection,
|
||
'transition_order' => $use2ndOrder ? 2 : 1, // 新增:转移概率阶数
|
||
'transition_available' => $secondOrderAvailable, // 二阶是否可用
|
||
'history_count' => count($allHistory), // 历史期数
|
||
'min_periods_threshold' => $minPeriodsThreshold, // 阈值
|
||
'last_zone' => $zoneLabels[$lastZone] ?? '',
|
||
...
|
||
];
|
||
```
|
||
|
||
3. 在得分计算循环中(约 line 2342-2349)修改转移概率得分计算:
|
||
|
||
找到:
|
||
```php
|
||
// === 转移概率得分 ===
|
||
$transScore = $this->_calcTransitionScore(
|
||
$num, $lastZone, $lastTail, $lastHead,
|
||
$zoneTransition, $tailTransition, $headTransition,
|
||
$zoneMap, $tailMap, $headMap
|
||
);
|
||
```
|
||
|
||
替换为:
|
||
```php
|
||
// === 转移概率得分(根据阶数选择计算方法)===
|
||
if ($use2ndOrder && $zoneTransition2nd && $tailTransition2nd && $headTransition2nd) {
|
||
$transScore = $this->_calcTransitionScore2ndOrder(
|
||
$num, $lastZone, $prev2Zone, $lastTail, $prev2Tail, $lastHead, $prev2Head,
|
||
$zoneTransition2nd, $tailTransition2nd, $headTransition2nd,
|
||
$zoneMap, $tailMap, $headMap
|
||
);
|
||
$detail['trans_order'] = 2;
|
||
} else {
|
||
$transScore = $this->_calcTransitionScore(
|
||
$num, $lastZone, $lastTail, $lastHead,
|
||
$zoneTransition, $tailTransition, $headTransition,
|
||
$zoneMap, $tailMap, $headMap
|
||
);
|
||
$detail['trans_order'] = 1;
|
||
}
|
||
```
|
||
</action>
|
||
|
||
<acceptance_criteria>
|
||
- grep 匹配: `minPeriodsThreshold` 变量在 getPredictionV3 中存在(值为200)
|
||
- grep 匹配: `minStatePairCount` 变量在 getPredictionV3 中存在(值为5)
|
||
- grep 匹配: `$secondOrderAvailable` 变量在 getPredictionV3 中存在
|
||
- grep 匹配: `sufficientRatio` 在 getPredictionV3 中存在(状态对观察比例)
|
||
- grep 匹配: `_getTransitionMatrix2ndOrder` 在 getPredictionV3 中被调用
|
||
- grep 匹配: `transition_order` 在 analysis 数组中存在
|
||
- grep 匹配: `transition_available` 在 analysis 数组中存在
|
||
- grep 匹配: `_calcTransitionScore2ndOrder` 在得分计算中被调用
|
||
- 数据量阈值设置为 200 期(而非原100期)
|
||
- 状态对观察次数检查 >= 5,比例 >= 30%
|
||
</acceptance_criteria>
|
||
|
||
## Verification
|
||
|
||
执行预测接口验证二阶马尔可夫使用情况:
|
||
|
||
```bash
|
||
curl -s "http://127.0.0.1:8000/admin/history/predictV3?periods=300&backtest=10" | grep -E "transition_order|transition_available|history_count"
|
||
```
|
||
|
||
预期结果:
|
||
- periods >= 200 且状态对观察充足时,返回 transition_order: 2
|
||
- periods < 200 或状态对观察不足时,返回 transition_order: 1
|
||
- transition_available 显示二阶是否可用
|
||
|
||
## Success Criteria
|
||
|
||
1. `_getTransitionMatrix2ndOrder` 方法已实现,包含二阶状态空间构建
|
||
2. `_calcTransitionScore2ndOrder` 方法已实现
|
||
3. `getPredictionV3` 根据数据量和状态对观察次数自动选择一阶或二阶马尔可夫
|
||
4. 数据量阈值提升到 200 期(而非原100期)
|
||
5. 状态对观察次数检查 >= 5,比例 >= 30% 才使用二阶
|
||
6. analysis 返回中包含 transition_order、transition_available 字段
|
||
7. 所有新增方法包含函数级注释
|
||
8. depends_on 已修正为空数组(独立功能)
|
||
|
||
## Output
|
||
|
||
完成后创建 `.planning/phases/11-predictv3/11-05-SUMMARY.md` |