前言

这个是针对于知识工程的学习与进一步实践，个人感觉难度挺高的，前后花费了大概有三周的时间，第一周主要是解决依赖的各种报错问题，第二周主要用在数据集的裁切和平台迁移上（这个主要受制于gpu的内存不够），第三周主要用在调优思路的探索和实践上，花了这么长时间感觉还是跟学校的课程安排有关，以及现在已经快接近五月了，保研人应该都懂……各种夏令营的事情和课程大作业搞得有点晕头撞向的，因此只能尽自己最大努力利用时间来完成这个课程实践，最后嘛还是有许多遗憾，但只能止步于此了……

过程

使用model art平台

模型链接：

https://github.com/OpenDriveLab/ViDAR

前置：

配置model art镜像

fc91c387a9f5fa96a87e8662028e0c8

通过以上步骤进行镜像配置，就不过多赘述了，当时的解释md文件被我删除了……

遇到的问题以及解决方法

依赖报错问题：主要集中在numpy的版本上，因为model art本身要求的numpy版本较高，但是ViDAR又需要较低版本导致冲突，后面配置了一个脚本用于解决大部分问题：

vim install_deps.sh

然后：

#!/bin/bash

# 设置清华源
PIP_SOURCE="-i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn"

# 临时移除 ModelArts SDK 以避免干扰
export ORIGINAL_PYTHONPATH=$PYTHONPATH
export PYTHONPATH=$(echo $PYTHONPATH | sed 's|/home/ma-user/modelarts-dev/modelarts-sdk||g')

# 打印环境信息
echo "Checking Python and PyTorch versions..."
python -c "import sys, torch; print('Python:', sys.version); print('PyTorch:', torch.__version__, 'CUDA:', torch.cuda.is_available())"

# 步骤 1：修复依赖冲突
echo "Fixing dependency conflicts..."
pip install numpy==1.23.5 --force-reinstall $PIP_SOURCE
pip install networkx==2.2 --force-reinstall $PIP_SOURCE
pip install pyasn1==0.6.1 --force-reinstall $PIP_SOURCE
pip install pandas==1.2.5 --force-reinstall $PIP_SOURCE  # 确保 pandas 版本

# 步骤 2：处理“平台不支持”包，降级到兼容版本
echo "Fixing platform compatibility issues..."
pip install mmengine
pip install PyYAML==6.0 charset-normalizer==3.3.2 fonttools==4.38.0 kiwisolver==1.4.5 \
    lxml==4.9.3 matplotlib==3.5.2 simplejson==3.19.2 MarkupSafe==2.1.5 \
    cffi==1.16.0 greenlet==3.0.3 ijson==3.2.4 SQLAlchemy==2.0.30 \
    --force-reinstall $PIP_SOURCE

# 步骤 3：安装 mmcv-full==1.4.0
echo "Installing mmcv-full==1.4.0..."
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html $PIP_SOURCE

# 步骤 4：安装 mmdet3d 剩余依赖
echo "Installing mmdet3d dependencies..."
pip install lyft_dataset_sdk nuscenes-devkit plyfile tensorboard numba==0.48.0 scikit-image==0.19.3 $PIP_SOURCE
pip install numpy==1.23.5 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html -i https://pypi.tuna.tsinghua.edu.cn/simple

# 步骤 5：安装 mmdet==2.14.0 和 mmsegmentation==0.14.1
echo "Installing mmdet==2.14.0 and mmsegmentation==0.14.1..."
pip install mmdet==2.14.0 $PIP_SOURCE
pip install mmsegmentation==0.14.1 $PIP_SOURCE

git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1 # Other versions may not be compatible.
python setup.py install
cd ..

# 步骤 6：安装 detectron2 和其他依赖
echo "Installing detectron2 and other dependencies..."
pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13 typing-extensions==4.5.0 \
    pylint ipython==8.12 matplotlib==3.5.2 numba==0.48.0 setuptools==59.5.0 $PIP_SOURCE
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' $PIP_SOURCE


# 步骤 7：安装 ViDAR 和 chamferdistance
echo "Installing ViDAR and chamferdistance..."
if [ ! -d "ViDAR" ]; then
    git clone https://github.com/OpenDriveLab/ViDAR
fi
cd ViDAR
mkdir -p pretrained
cd pretrained
wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth || echo "Pretrained model download failed, continuing..."
cd ../third_lib/chamfer_dist/chamferdist/
pip install . $PIP_SOURCE
cd ../../..
pip install matplotlib==3.5.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyparsing==2.4.7 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install kiwisolver==1.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install --user prettytable==3.7.0

# 提示完成
echo "Installation complete. If errors occurred, check logs above."

# 可选：隔离环境（注释掉，需手动启用
# echo "If conflicts persist, consider creating a clean environment:"
# echo "conda create -n vidar_clean python=3.8"
# echo "conda activate vidar_clean"
# echo "Then rerun this script."

然后就是喜闻乐见的一键解决问题了

chmod +x install_deps.sh
./install_deps.sh

然后有两个地方需要修改：

微信图片_2025-04-15_210139_539

微信图片_2025-04-15_210556_226

之后到达vidar目录下：

CONFIG=ViDAR/projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_train_1_8_3future.py GPU_NUM=1

CONFIG=projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_1_8_3future.py

GPU_NUM=1

./tools/dist_train.sh ${CONFIG} ${GPU_NUM}

数据集

使用openxlab软件包。注：最高版本openxlab需要python≥3.8，因此需要创建一个虚拟环境安装

conda create -n openxlab python=3.9
pip install openxlab
openxlab login # 需要创建openxlab账号之后，创建access key再在这里登录
#我的AK/SK放在代码块外面
openxlab dataset download --dataset-repo OpenDriveLab/OpenScene --source-path /openscene-v1.1/openscene_sensor_mini_camera.tgz  --target-path .
openxlab dataset download --dataset-repo OpenDriveLab/OpenScene --source-path /openscene-v1.1/openscene_sensor_mini_lidar.tgz  --target-path .

Access Key: wgakjbrzyyxljprb1b2z Secret Key: rnyq568lwdpayblrb744qdmxyg4xz19vo3b0azog

然后用上面的指令解压。大概占据硬盘空间170GB左右，因为要解压所以硬盘建议大概250-300GB

数据集解压后自动移动 MergedPointCloud 到目标路径：

可以运行脚本vim fix_mergedpointcloud.py

import os
import shutil

# 根目录路径（根据你实际目录修改）
bad_root = "ViDAR/data/openscene_v1.1/OpenDriveLab___OpenScene/openscene-v1.1/openscene_v1.1/sensor_blobs/mini"
correct_root = "ViDAR/data/openscene_v1.1/sensor_blobs/mini"

# 遍历所有 subdir
for subdir in os.listdir(bad_root):
    full_bad_path = os.path.join(bad_root, subdir, "MergedPointCloud")
    full_target_dir = os.path.join(correct_root, subdir)

    if os.path.exists(full_bad_path):
        target_path = os.path.join(full_target_dir, "MergedPointCloud")

        print(f"Moving {full_bad_path} --> {target_path}")

        os.makedirs(full_target_dir, exist_ok=True)
        if os.path.exists(target_path):
            print(f"  - Skipping {target_path} (already exists)")
        else:
            shutil.move(full_bad_path, target_path)

执行：python fix_mergedpointcloud.py

或者创建软连接

import os

bad_root = "ViDAR/data/openscene_v1.1/OpenDriveLab___OpenScene/openscene-v1.1/openscene_v1.1/sensor_blobs/mini"
correct_root = "ViDAR/data/openscene_v1.1/sensor_blobs/mini"

for sequence in os.listdir(bad_root):
    bad_mp = os.path.join(bad_root, sequence, "MergedPointCloud")
    correct_target_dir = os.path.join(correct_root, sequence)
    correct_link_path = os.path.join(correct_target_dir, "MergedPointCloud")

    if os.path.exists(bad_mp):
        if not os.path.exists(correct_target_dir):
            print(f"路径不存在，创建中：{correct_target_dir}")
            os.makedirs(correct_target_dir)

        if not os.path.exists(correct_link_path):
            print(f"创建软链接：{correct_link_path} -> {bad_mp}")
            os.symlink(os.path.abspath(bad_mp), correct_link_path)
        else:
            print(f"已存在：{correct_link_path}")

openscene_metadata_mini.tgz可以下载本地再直接上传

最后： python tools/collect_nuplan_data.py mini

训练的一些报错

ninja报错

RuntimeError: Ninja is required to load C++

解决：从源码构建并本地安装

下载和构建 Ninja：

克隆 Ninja 官方仓库：

bash
git clone <https://github.com/ninja-build/ninja.git>
cd ninja
python configure.py --bootstrap

这会生成 ninja 二进制文件。

创建本地目录并移动文件：
- 创建 ~/bin 目录（如果不存在）：
  bash
  mkdir -p ~/bin
- 将 ninja 二进制文件移动到 ~/bin：
  bash
  mv ninja ~/bin/
更新 PATH 环境变量：
- 临时添加至当前会话：
  bash
  export PATH=~/bin:$PATH
验证安装：
- 运行 ninja --version 检查是否成功。

crypt.h报错

解决：

步骤 1：确认 glibc-2.27 源码
- 确保你已从 GNU FTP 服务器下载并解压了 glibc-2.27.tar.xz 文件。如果未下载，使用以下命令：
  ruby
  wget <https://ftp.gnu.org/gnu/glibc/glibc-2.27.tar.xz>
  tar -xJf glibc-2.27.tar.xz
- 确认解压后生成了 glibc-2.27 目录，并包含 include 子目录。
步骤 2：复制所有头文件
- 将 glibc-2.27/include 目录下的所有头文件复制到你的本地 ~/include 目录，以确保所有依赖头文件（如 features.h、stdint.h 等）都可用：
  bash
  mkdir -p ~/include
  cp -r glibc-2.27/include/* ~/include/
- 这将复制包括 crypt.h 在内的所有头文件到 ~/include，确保编译器能找到所有必需的依赖。
步骤 3：设置包含路径
- 设置 CPLUS_INCLUDE_PATH 环境变量，确保编译器优先搜索 ~/include 目录：
  bash
  export CPLUS_INCLUDE_PATH=~/include:$CPLUS_INCLUDE_PATH
- 验证环境变量设置：
  bash
  echo $CPLUS_INCLUDE_PATH
  应包含 ~/include。
步骤 4：检查系统头文件

首先，检查你的环境中是否已有必要的头文件：
- 运行
  ls /usr/include/crypt.h
  查看是否已有
  crypt.h
  。如果存在，尝试使用系统编译器：
  - 运行 export CC=/usr/bin/gcc 和 export CXX=/usr/bin/g++。
  - 然后取消设置 CPLUS_INCLUDE_PATH：unset CPLUS_INCLUDE_PATH。
  - 重新运行训练脚本：./tools/dist_train.sh ${CONFIG} ${GPU_NUM}。

GLIBCXX_3.4.29报错

参考：

[如何解决version `GLIBCXX_3.4.29‘ not found的问题_glibcxx not found-CSDN博客](https://blog.csdn.net/weixin_39379635/article/details/129159713)

解决：

ImportError: /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6: version `GLIBCXX_3.4.29' not found

1、使用指令先看下系统目前都有哪些版本的

strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6 | grep GLIBCXX

发现只到3.4.22

2、来查看当前系统中其它的同类型文件，

sudo find / -name "libstdc++.so.6*"

找到一个版本比较高的 /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29

查看

strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 | grep GLIBCXX

有了3.4.29

3、复制到指定目录并建立新的链接

复制
sudo cp /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 /home/ma-user/anaconda3/envs/vidar/lib/

删除之前链接
sudo rm /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6

创建新的链接
sudo ln -s /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6

验证

strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6 | grep GLIBCXX

有了3.4.29

另：如果是/usr/lib/x86_64-linux-gnu/libstdc++.so.6报错，使用：

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

GPU爆内存

cuda out of memory

只训练mini数据集的一部分，注意留的数据meta_datas里的文件夹名字和sensor_blobs里文件夹名对应

fsspec 与 Python 3.8 兼容性问题

TypeError: ‘type’ object is not subscriptable

解决方法：降低到fsspec可以兼容3.8的版本

pip install fsspec==2025.3.0

ViDAR模型实现分析

模型架构概述

ViDAR（Visual Point Cloud Forecasting enables Scalable Autonomous Driving）是一个基于BEVFormer架构的模型，专注于自动驾驶场景中的视觉点云预测。从 vidar_transformer.py 文件可以看出，它主要实现了一个预测变换器（PredictionTransformer）。

核心组件

PredictionTransformer ：
- 这是ViDAR的核心组件，用于从多帧BEV特征预测下一帧的BEV特征
- 使用了自定义的解码器来处理时序信息
注意力机制：
- 时间自注意力（TemporalSelfAttention）：处理时间维度上的信息
- 空间交叉注意力（MSDeformableAttention3D）：处理3D空间中的信息
- 自定义可变形注意力（CustomMSDeformableAttention）：用于处理特征对齐

项目结构

项目结构显示ViDAR是基于BEVFormer进行扩展的：

projects/
├── configs/
│ ├── base/
│ ├── bevformer/
│ ├── vidar_finetune/ # ViDAR微调配置
│ └── vidar_pretrain/ # ViDAR预训练配置
└── mmdet3d_plugin/
├── bevformer/ # BEVFormer相关模块
├── core/ # 核心评估和功能模块
├── datasets/ # 数据集处理
├── dd3d/ # 3D检测相关模块
└── models/ # 模型定义

与BEVFormer的关系

ViDAR似乎是在BEVFormer基础上的扩展，专注于未来帧预测：

BEVFormer主要关注多视角图像到BEV表示的转换
ViDAR则进一步关注BEV表示的时序预测，实现对未来场景的预测

调优思路

融合Nskg

写在前面：

这个项目其实没有想象的复杂，大概如下：

ViDAR模型似乎是一个基于BEVFormer架构的3D检测/分割模型，它利用MMDetection3D框架实现，支持多视角图像到BEV表示的转换。该模型具有灵活的配置系统、插件扩展能力和完善的训练功能。

那么有了以上信息就好做了，首先学习下MMDetection3D的使用方法，Vidar只不过在上面多封装了一层，那么这一层应该也可以写插件进行拓展

核心目录：

模型定义文件（可能在 projects/mmdet3d_plugin/bevformer/ 目录下）
配置文件（通过命令行参数 config 指定）
自定义训练函数 custom_train_model 的实现

论文1：nuScenes Knowledge Graph (nSKG)

nuScenes Knowledge Graph (nSKG) 文章的内容可以很好地融入对 ViDAR: Visual Point Cloud Forecasting 模型的理解和调优中，尤其是在以下几个方面：
1. 丰富场景表示：nSKG 提供了 nuScenes 数据集的综合语义表示，包含交通场景中的实体（如车辆、行人、车道、交通信号灯）和它们之间的语义与空间关系。这可以增强 ViDAR 的输入数据，改善其点云预测和下游任务（如感知、规划）的性能。
2. 数据处理改进：nSKG 的结构化数据（以知识图谱和 PyTorch Geometric 格式提供）可以直接用于 ViDAR 的数据管道，减少数据预处理的工程负担。
3. 模型架构增强：nSKG 的异构图表示可以与 ViDAR 的 Transformer 或 BEVFormer 模块结合，引入图神经网络（GNN）来处理语义关系，提升预测的鲁棒性和可解释性。
4. 调优方向：利用 nSKG 的丰富上下文（如车道拓扑、代理关系），可以优化 ViDAR 的超参数、数据增强策略和损失函数，特别是在处理复杂交通场景时。

实际修改：

e879ac0c92d040202e3ba7450e5459b

ViDAR项目集成nSTP的工作总结

根据当前代码库，我将总结从初始状态到现在为止，为了集成nSTP（Neural Scene-Time Priors）所做的工作。

1. 核心文件创建

1.1 nSTP编码器模块

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\modules\nstp_encoder.py

主要功能:

创建了NSTPEncoder类：使用图神经网络（GraphSAGE或GAT）处理nSTP图数据
创建了NSTPEnhancer类：将nSTP特征与BEV特征融合，通过注意力机制增强BEV特征

import torch
import torch.nn as nn
from torch_geometric.nn import GraphSAGE, GATConv

class NSTPEncoder(nn.Module):
    """nSTP图数据编码器"""
    
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers=3, 
                 gnn_type='graphsage', dropout=0.1, aggr='mean'):
        super().__init__()
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.out_channels = out_channels
        self.num_layers = num_layers
        self.gnn_type = gnn_type
        
        # 输入特征投影
        self.input_proj = nn.Linear(in_channels, hidden_channels)
        
        # 图神经网络
        if gnn_type == 'graphsage':
            self.gnn = GraphSAGE(
                in_channels=hidden_channels,
                hidden_channels=hidden_channels,
                num_layers=num_layers,
                out_channels=out_channels,
                dropout=dropout,
                aggr=aggr
            )
        elif gnn_type == 'gat':
            # 简化版GAT实现
            self.gnn_layers = nn.ModuleList()
            self.gnn_layers.append(GATConv(hidden_channels, hidden_channels))
            for _ in range(num_layers - 2):
                self.gnn_layers.append(GATConv(hidden_channels, hidden_channels))
            self.gnn_layers.append(GATConv(hidden_channels, out_channels))
            self.dropout = nn.Dropout(dropout)
        else:
            raise ValueError(f"不支持的GNN类型: {gnn_type}")
        
    def forward(self, data):
        """前向传播
        
        Args:
            data: PyG Data对象或包含x和edge_index的字典
                
        Returns:
            torch.Tensor: 节点特征
        """
        # 添加对None的处理
        if data is None:
            # 返回一个空的特征张量
            return torch.zeros((1, self.out_channels), device=self.input_proj.weight.device)
            
        if hasattr(data, 'x') and hasattr(data, 'edge_index'):
            x, edge_index = data.x, data.edge_index
        elif isinstance(data, dict) and 'x' in data and 'edge_index' in data:
            x, edge_index = data['x'], data['edge_index']
        else:
            print(f"警告: 输入数据格式不正确: {type(data)}")
            # 返回一个空的特征张量
            return torch.zeros((1, self.out_channels), device=self.input_proj.weight.device)
        
        # 确保x和edge_index是张量
        if not isinstance(x, torch.Tensor):
            x = torch.tensor(x, dtype=torch.float, device=self.input_proj.weight.device)
        if not isinstance(edge_index, torch.Tensor):
            edge_index = torch.tensor(edge_index, dtype=torch.long, device=self.input_proj.weight.device)
                
        # 特征投影
        x = self.input_proj(x)
        
        # 图神经网络处理
        if self.gnn_type == 'graphsage':
            x = self.gnn(x, edge_index)
        else:  # gat
            for i, layer in enumerate(self.gnn_layers):
                if i < len(self.gnn_layers) - 1:
                    x = layer(x, edge_index)
                    x = torch.relu(x)
                    x = self.dropout(x)
                else:
                    x = layer(x, edge_index)
                    
        return x


class NSTPEnhancer(nn.Module):
    """nSTP特征增强器，用于增强BEV特征"""
    
    def __init__(self, bev_channels, nstp_channels, hidden_channels, bev_h, bev_w, use_attention=True):
        super().__init__()
        self.bev_channels = bev_channels
        self.nstp_channels = nstp_channels
        self.hidden_channels = hidden_channels
        self.bev_h = bev_h
        self.bev_w = bev_w
        self.use_attention = use_attention
        
        # 特征融合层
        self.nstp_proj = nn.Linear(nstp_channels, hidden_channels)
        self.bev_proj = nn.Linear(bev_channels, hidden_channels)
        
        if use_attention:
            # 注意力机制
            self.query_proj = nn.Linear(hidden_channels, hidden_channels)
            self.key_proj = nn.Linear(hidden_channels, hidden_channels)
            self.value_proj = nn.Linear(hidden_channels, hidden_channels)
            self.attention_scale = hidden_channels ** -0.5
            
        # 输出投影
        self.output_proj = nn.Linear(hidden_channels, bev_channels)
        
    def forward(self, bev_feat, nstp_feat, nstp_pos=None):
        """前向传播
        
        Args:
            bev_feat (torch.Tensor): BEV特征 [B, C, H, W]
            nstp_feat (torch.Tensor): nSTP节点特征 [B, N, C]
            nstp_pos (torch.Tensor, optional): nSTP节点位置 [B, N, 2]
            
        Returns:
            torch.Tensor: 增强后的BEV特征 [B, C, H, W]
        """
        B, C, H, W = bev_feat.shape
        bev_feat_flat = bev_feat.flatten(2).permute(0, 2, 1)  # [B, H*W, C]
        
        # 特征投影
        bev_feat_proj = self.bev_proj(bev_feat_flat)  # [B, H*W, hidden]
        nstp_feat_proj = self.nstp_proj(nstp_feat)  # [B, N, hidden]
        
        if self.use_attention:
            # 计算注意力
            query = self.query_proj(bev_feat_proj)  # [B, H*W, hidden]
            key = self.key_proj(nstp_feat_proj)  # [B, N, hidden]
            value = self.value_proj(nstp_feat_proj)  # [B, N, hidden]
            
            # 注意力分数
            attn = torch.bmm(query, key.transpose(1, 2)) * self.attention_scale  # [B, H*W, N]
            attn = torch.softmax(attn, dim=-1)
            
            # 加权特征
            context = torch.bmm(attn, value)  # [B, H*W, hidden]
            
            # 融合特征
            enhanced_feat = context + bev_feat_proj
        else:
            # 简单平均
            nstp_feat_expanded = nstp_feat_proj.mean(dim=1, keepdim=True).expand(-1, H*W, -1)
            enhanced_feat = bev_feat_proj + nstp_feat_expanded
            
        # 输出投影
        enhanced_feat = self.output_proj(enhanced_feat)  # [B, H*W, C]
        
        # 重塑为BEV特征
        enhanced_feat = enhanced_feat.permute(0, 2, 1).reshape(B, C, H, W)
        
        return enhanced_feat

1.2 nSTP数据处理组件

文件路径: ViDAR\projects\mmdet3d_plugin\datasets\pipelines\nstp_transform.py

主要功能:

创建了ProcessNSTPGraph类：处理nSTP图数据，确保格式正确并转换为PyTorch张量

import torch
import numpy as np
from mmdet.datasets.builder import PIPELINES

@PIPELINES.register_module()
class ProcessNSTPGraph:
    """处理nSTP图数据的转换组件"""
    
    def __init__(self, graph_feat_dim=64, with_agent_type=True):
        self.graph_feat_dim = graph_feat_dim
        self.with_agent_type = with_agent_type
    
    def __call__(self, results):
        """处理nSTP图数据"""
        if 'nstp_graph' not in results:
            return results
            
        graph_data = results['nstp_graph']
        
        # 处理PyG Data对象
        if hasattr(graph_data, 'x') and hasattr(graph_data, 'edge_index'):
            # 已经是PyG Data对象，确保张量类型正确
            if not isinstance(graph_data.x, torch.Tensor):
                graph_data.x = torch.tensor(graph_data.x, dtype=torch.float)
            if not isinstance(graph_data.edge_index, torch.Tensor):
                graph_data.edge_index = torch.tensor(graph_data.edge_index, dtype=torch.long)
            if hasattr(graph_data, 'edge_attr') and not isinstance(graph_data.edge_attr, torch.Tensor):
                graph_data.edge_attr = torch.tensor(graph_data.edge_attr, dtype=torch.float)
                
        # 处理字典格式的图数据
        elif isinstance(graph_data, dict):
            # 处理节点特征
            if 'x' in graph_data:
                x = graph_data['x']
                if isinstance(x, np.ndarray):
                    x = torch.from_numpy(x).float()
                elif isinstance(x, list):
                    x = torch.tensor(x).float()
                elif not isinstance(x, torch.Tensor):
                    x = torch.tensor(x, dtype=torch.float)
                graph_data['x'] = x
                
            # 处理边索引
            if 'edge_index' in graph_data:
                edge_index = graph_data['edge_index']
                if isinstance(edge_index, np.ndarray):
                    edge_index = torch.from_numpy(edge_index).long()
                elif isinstance(edge_index, list):
                    edge_index = torch.tensor(edge_index).long()
                elif not isinstance(edge_index, torch.Tensor):
                    edge_index = torch.tensor(edge_index, dtype=torch.long)
                graph_data['edge_index'] = edge_index
                
            # 处理边属性
            if 'edge_attr' in graph_data:
                edge_attr = graph_data['edge_attr']
                if isinstance(edge_attr, np.ndarray):
                    edge_attr = torch.from_numpy(edge_attr).float()
                elif isinstance(edge_attr, list):
                    edge_attr = torch.tensor(edge_attr).float()
                elif not isinstance(edge_attr, torch.Tensor):
                    edge_attr = torch.tensor(edge_attr, dtype=torch.float)
                graph_data['edge_attr'] = edge_attr
        
        # 更新结果
        results['nstp_graph'] = graph_data
        return results

2. 现有文件修改

2.1 数据集类修改

2.1.1 NuScenes数据集

文件路径: ViDAR\projects\mmdet3d_plugin\datasets\nuscenes_vidar_dataset_v1.py

主要修改:

添加了nSTP相关参数：use_nstp, nstp_path
实现了_load_nstp_data方法：加载nSTP图数据文件
修改了get_data_info方法：将nSTP数据添加到样本信息中

#---------------------------------------------------------------------------------#
# Visual Point Cloud Forecasting enables Scalable Autonomous Driving              #
# Copyright (c) OpenDriveLab. All rights reserved.                                #
#---------------------------------------------------------------------------------#

import copy
import torch
import numpy as np
import os

# 导入rdflib
try:
    from rdflib import Graph
except ImportError:
    print("警告: 未安装rdflib库，无法加载.ttl格式的nSKG数据")

from mmdet.datasets import DATASETS
from nuscenes.eval.common.utils import quaternion_yaw, Quaternion
from nuscenes.utils.geometry_utils import transform_matrix
from mmcv.parallel import DataContainer as DC

from .nuscenes_vidar_dataset_template import NuScenesViDARDatasetTemplate


@DATASETS.register_module()
class NuScenesViDARDatasetV1(NuScenesViDARDatasetTemplate):  # 确保类名为NuScenesViDARDatasetV1
    """NuScenes visual point cloud forecasting dataset.
    """
    def __init__(self,
                 ann_file,
                 pipeline=None,
                 data_root=None,
                 classes=None,
                 load_interval=1,
                 modality=None,
                 box_type_3d='LiDAR',
                 filter_empty_gt=True,
                 test_mode=False,
                 use_valid_flag=False,
                 history_queue_length=None,
                 pred_history_frame_num=0,
                 pred_future_frame_num=0,
                 per_frame_loss_weight=(1.0,),
                 use_nskg=False,
                 nskg_path=None,
                 nskg_ontology_path=None,
                 use_nstp=False,  # 添加nSTP支持参数
                 nstp_path=None,  # 添加nSTP数据路径参数
                 **kwargs):
        # 保存history_queue_length参数，但不传递给父类
        self.history_queue_length = history_queue_length
        
        # 调用父类初始化方法，移除history_queue_length参数
        super().__init__(
            ann_file=ann_file,
            pipeline=pipeline,
            data_root=data_root,
            classes=classes,
            load_interval=load_interval,
            modality=modality,
            box_type_3d=box_type_3d,
            filter_empty_gt=filter_empty_gt,
            test_mode=test_mode,
            use_valid_flag=use_valid_flag,
            **kwargs)
        
        # 保存nSKG相关参数
        self.use_nskg = use_nskg
        self.nskg_path = nskg_path
        self.nskg_ontology_path = nskg_ontology_path
        
        # 保存nSTP相关参数
        self.use_nstp = use_nstp
        self.nstp_path = nstp_path
        
        # 保存预测帧数相关参数
        self.pred_history_frame_num = pred_history_frame_num
        self.pred_future_frame_num = pred_future_frame_num
        self.per_frame_loss_weight = per_frame_loss_weight
        
        # 如果启用nSKG，加载相关数据
        if self.use_nskg and self.nskg_path is not None:
            self._load_nskg_data()
    
    def _load_nskg_data(self):
        """加载nSKG数据"""
        self.nskg_data = {}
        if not os.path.exists(self.nskg_path):
            print(f"警告: nSKG数据路径 {self.nskg_path} 不存在")
            self.use_nskg = False
            return

        try:
            if self.nskg_path.endswith('.ttl') and self.nskg_ontology_path is not None:
                g = Graph()
                try:
                    g.parse(self.nskg_path, format='turtle')
                except Exception as e:
                    print(f"警告: TTL文件解析失败: {str(e)}")
                    self.use_nskg = False
                    return

                # 加载本体文件
                if os.path.exists(self.nskg_ontology_path):
                    for onto_file in os.listdir(self.nskg_ontology_path):
                        if onto_file.endswith('.ttl'):
                            onto_path = os.path.join(self.nskg_ontology_path, onto_file)
                            try:
                                g.parse(onto_path, format='turtle')
                            except Exception as e:
                                print(f"警告: 本体文件 {onto_file} 解析失败: {str(e)}")

                print(f"成功加载nSKG数据，共 {len(g)} 个三元组")
                self.nskg_data = self._convert_rdf_to_pyg(g)
            else:
                import pickle
                with open(self.nskg_path, 'rb') as f:
                    self.nskg_data = pickle.load(f)
                print(f"成功加载nSKG数据，共 {len(self.nskg_data)} 条记录")
        except Exception as e:
            print(f"加载nSKG数据失败: {str(e)}")
            print("继续训练，但不使用nSKG数据")
            self.use_nskg = False
    
    def _convert_rdf_to_pyg(self, graph):
        """将RDF图转换为PyG格式
        
        Args:
            graph: RDF图对象
            
        Returns:
            转换后的数据字典，键为sample_token
        """
        result = {}
        try:
            import torch_geometric as pyg
            
            # 查询所有场景
            scenes = {}
            for s, p, o in graph.triples((None, None, None)):
                # 假设每个场景都有一个token属性
                if str(p).endswith('hasToken'):
                    scene_uri = str(s)
                    token = str(o)
                    scenes[scene_uri] = token
            
            # 为每个场景构建图
            for scene_uri, token in scenes.items():
                # 收集节点
                nodes = {}
                node_types = {}
                node_features = {}
                
                # 收集边
                edges = {}
                
                # 查询与场景相关的所有三元组
                for s, p, o in graph.triples((None, None, None)):
                    # 处理节点和边的逻辑...
                    pass
                
                # 构建PyG数据对象
                data = {
                    'x': node_features,
                    'edge_index': edges,
                    'node_type': node_types
                }
                
                result[token] = data
                
            return result
        except ImportError:
            print("警告: 未安装PyTorch Geometric库，无法转换RDF数据为图格式")
            return {}
    
    def get_data_info(self, index):
        """获取数据信息，添加nSKG或nSTP数据"""
        info = super().get_data_info(index)
        
        # 获取当前样本的标识符
        sample_token = info.get('sample_token', None)
        
        # 如果启用nSKG，添加nSKG数据到info中
        if self.use_nskg and hasattr(self, 'nskg_data') and self.nskg_data and sample_token in self.nskg_data:
            info['nskg_graph'] = self.nskg_data[sample_token]
        
        # 如果启用nSTP，添加nSTP数据到info中（优先使用nSTP）
        if self.use_nstp and hasattr(self, 'nstp_data') and self.nstp_data and sample_token in self.nstp_data:
            info['nstp_graph'] = self.nstp_data[sample_token]
            # 如果同时存在nSKG和nSTP，使用nSTP替代nSKG
            if 'nskg_graph' in info:
                del info['nskg_graph']
        
        return info
    
    def _mask_points(self, pts_list):
        assert self.ego_mask is not None
        # remove points belonging to ego vehicle.
        masked_pts_list = []
        for pts in pts_list:
            ego_mask = np.logical_and(
                np.logical_and(self.ego_mask[0] <= pts[:, 0],
                               self.ego_mask[2] >= pts[:, 0]),
                np.logical_and(self.ego_mask[1] <= pts[:, 1],
                               self.ego_mask[3] >= pts[:, 1]),
            )
            pts = pts[np.logical_not(ego_mask)]
            masked_pts_list.append(pts)
        pts_list = masked_pts_list
        return pts_list

    def union2one(self, previous_queue, future_queue):
        # 1. get transformation from all frames to current (reference) frame
        ref_meta = previous_queue[-1]['img_metas'].data
        valid_scene_token = ref_meta['scene_token']
        # compute reference e2g_transform and g2e_transform.
        ref_e2g_translation = ref_meta['ego2global_translation']
        ref_e2g_rotation = ref_meta['ego2global_rotation']
        ref_e2g_transform = transform_matrix(
            ref_e2g_translation, Quaternion(ref_e2g_rotation), inverse=False)
        ref_g2e_transform = transform_matrix(
            ref_e2g_translation, Quaternion(ref_e2g_rotation), inverse=True)
        ref_l2e_translation = ref_meta['lidar2ego_translation']
        ref_l2e_rotation = ref_meta['lidar2ego_rotation']
        ref_l2e_transform = transform_matrix(
            ref_l2e_translation, Quaternion(ref_l2e_rotation), inverse=False)
        ref_e2l_transform = transform_matrix(
            ref_l2e_translation, Quaternion(ref_l2e_rotation), inverse=True)

        queue = previous_queue[:-1] + future_queue
        pts_list = [each['points'].data for each in queue]
        if self.ego_mask is not None:
            pts_list = self._mask_points(pts_list)
        total_cur2ref_lidar_transform = []
        total_ref2cur_lidar_transform = []
        total_pts_list = []
        for i, each in enumerate(queue):
            meta = each['img_metas'].data

            # store points in the current frame.
            cur_pts = pts_list[i].cpu().numpy().copy()
            cur_pts[:, -1] = i
            total_pts_list.append(cur_pts)

            # store the transformation from current frame to reference frame.
            curr_e2g_translation = meta['ego2global_translation']
            curr_e2g_rotation = meta['ego2global_rotation']
            curr_e2g_transform = transform_matrix(
                curr_e2g_translation, Quaternion(curr_e2g_rotation), inverse=False)
            curr_g2e_transform = transform_matrix(
                curr_e2g_translation, Quaternion(curr_e2g_rotation), inverse=True)

            curr_l2e_translation = meta['lidar2ego_translation']
            curr_l2e_rotation = meta['lidar2ego_rotation']
            curr_l2e_transform = transform_matrix(
                curr_l2e_translation, Quaternion(curr_l2e_rotation), inverse=False)
            curr_e2l_transform = transform_matrix(
                curr_l2e_translation, Quaternion(curr_l2e_rotation), inverse=True)

            # compute future to reference matrix.
            cur_lidar_to_ref_lidar = (curr_l2e_transform.T @
                                      curr_e2g_transform.T @
                                      ref_g2e_transform.T @
                                      ref_e2l_transform.T)
            total_cur2ref_lidar_transform.append(cur_lidar_to_ref_lidar)

            # compute reference to future matrix.
            ref_lidar_to_cur_lidar = (ref_l2e_transform.T @
                                      ref_e2g_transform.T @
                                      curr_g2e_transform.T @
                                      curr_e2l_transform.T)
            total_ref2cur_lidar_transform.append(ref_lidar_to_cur_lidar)

        # 2. Parse previous and future can_bus information.
        imgs_list = [each['img'].data for each in previous_queue]
        metas_map = {}
        prev_scene_token = None
        prev_pos = None
        prev_angle = None
        ref_meta = previous_queue[-1]['img_metas'].data

        # 2.2. Previous
        for i, each in enumerate(previous_queue):
            metas_map[i] = each['img_metas'].data

            if 'aug_param' in each:
                metas_map[i]['aug_param'] = each['aug_param']

            if metas_map[i]['scene_token'] != prev_scene_token:
                metas_map[i]['prev_bev_exists'] = False
                prev_scene_token = metas_map[i]['scene_token']
                prev_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
                prev_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
                # Set the original point of this motion.
                new_can_bus = copy.deepcopy(metas_map[i]['can_bus'])
                new_can_bus[:3] = 0
                new_can_bus[-1] = 0
                metas_map[i]['can_bus'] = new_can_bus
            else:
                metas_map[i]['prev_bev_exists'] = True
                tmp_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
                tmp_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
                # Compute the later waypoint.
                # To align the shift and rotate difference due to the BEV.
                new_can_bus = copy.deepcopy(metas_map[i]['can_bus'])
                new_can_bus[:3] = tmp_pos - prev_pos
                new_can_bus[-1] = tmp_angle - prev_angle
                metas_map[i]['can_bus'] = new_can_bus
                prev_pos = copy.deepcopy(tmp_pos)
                prev_angle = copy.deepcopy(tmp_angle)

            # compute cur_lidar_to_ref_lidar transformation matrix for quickly align generated
            #  bev features to the reference frame.
            metas_map[i]['ref_lidar_to_cur_lidar'] = total_ref2cur_lidar_transform[i]

        # 2.3. Future
        current_scene_token = ref_meta['scene_token']
        ref_can_bus = None
        future_can_bus = []
        future2ref_lidar_transform = []
        ref2future_lidar_transform = []
        for i, each in enumerate(future_queue):
            future_meta = each['img_metas'].data
            if future_meta['scene_token'] != current_scene_token:
                break

            # store the transformation:
            future2ref_lidar_transform.append(
                total_cur2ref_lidar_transform[i + len(previous_queue) - 1]
            )  # current -> reference.
            ref2future_lidar_transform.append(
                total_ref2cur_lidar_transform[i + len(previous_queue) - 1]
            )  # reference -> current.

            # can_bus information.
            if i == 0:
                new_can_bus = copy.deepcopy(future_meta['can_bus'])
                new_can_bus[:3] = 0
                new_can_bus[-1] = 0
                future_can_bus.append(new_can_bus)
                ref_can_bus = copy.deepcopy(future_meta['can_bus'])
            else:
                new_can_bus = copy.deepcopy(future_meta['can_bus'])

                new_can_bus_pos = np.array([0, 0, 0, 1]).reshape(1, 4)
                ref2prev_lidar_transform = ref2future_lidar_transform[-2]
                cur2ref_lidar_transform = future2ref_lidar_transform[-1]
                new_can_bus_pos = new_can_bus_pos @ cur2ref_lidar_transform @ ref2prev_lidar_transform

                new_can_bus_angle = new_can_bus[-1] - ref_can_bus[-1]
                new_can_bus[:3] = new_can_bus_pos[:, :3]
                new_can_bus[-1] = new_can_bus_angle
                future_can_bus.append(new_can_bus)
                ref_can_bus = copy.deepcopy(future_meta['can_bus'])

        ret_queue = previous_queue[-1]
        ret_queue['img'] = DC(torch.stack(imgs_list), cpu_only=False, stack=True)
        ret_queue.pop('aug_param', None)

        metas_map[len(previous_queue) - 1]['future_can_bus'] = np.array(future_can_bus)
        metas_map[len(previous_queue) - 1]['future2ref_lidar_transform'] = (
            np.array(future2ref_lidar_transform))
        metas_map[len(previous_queue) - 1]['ref2future_lidar_transform'] = (
            np.array(ref2future_lidar_transform))
        metas_map[len(previous_queue) - 1]['total_cur2ref_lidar_transform'] = (
            np.array(total_cur2ref_lidar_transform))
        metas_map[len(previous_queue) - 1]['total_ref2cur_lidar_transform'] = (
            np.array(total_ref2cur_lidar_transform))

        ret_queue['img_metas'] = DC(metas_map, cpu_only=True)
        ret_queue.pop('points')
        ret_queue['gt_points'] = DC(
            torch.from_numpy(np.concatenate(total_pts_list, 0)), cpu_only=False)
        if len(future_can_bus) < 1 + self.future_length:
            return None
        return ret_queue

    def _load_nstp_data(self):
        """加载nSTP数据"""
        self.nstp_data = {}
        if not os.path.exists(self.nstp_path):
            print(f"警告: nSTP数据路径 {self.nstp_path} 不存在")
            self.use_nstp = False
            return
            
        try:
            import torch
            import glob
            import os.path as osp
            
            # 获取目录中所有的.pt文件
            pt_files = glob.glob(osp.join(self.nstp_path, "*.pt"))
            if not pt_files:
                print(f"警告: 在 {self.nstp_path} 中未找到.pt文件")
                self.use_nstp = False
                return
                
            print(f"找到 {len(pt_files)} 个nSTP数据文件")
            
            # 加载每个.pt文件
            for pt_file in pt_files:
                try:
                    # 从文件名获取样本ID
                    sample_id = osp.splitext(osp.basename(pt_file))[0]
                    
                    # 加载PyTorch张量
                    graph_data = torch.load(pt_file)
                    
                    # 将数据添加到字典中
                    self.nstp_data[sample_id] = graph_data
                    
                except Exception as e:
                    print(f"加载文件 {pt_file} 失败: {str(e)}")
                    
            print(f"成功加载 {len(self.nstp_data)} 个nSTP样本")
            
        except Exception as e:
            print(f"加载nSTP数据失败: {str(e)}")
            print("继续训练，但不使用nSTP数据")
            self.use_nstp = False

2.1.2 NuPlan数据集

文件路径: d:\git_clone\ViDAR\projects\mmdet3d_plugin\datasets\nuplan_vidar_dataset_v1.py

主要修改:

与NuScenes数据集类似，添加了nSTP支持
实现了特定于NuPlan数据集的nSTP数据加载和处理逻辑

2.2 模型头部修改

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\dense_heads\vidar_head_v1.py

主要修改:

添加了nSTP相关参数：use_nstp, nstp_encoder_cfg, nstp_enhancer_cfg
集成了nSTP编码器和增强器到模型头部
修改了前向传播逻辑，处理nSTP特征

#---------------------------------------------------------------------------------#
# Visual Point Cloud Forecasting enables Scalable Autonomous Driving              #
# Copyright (c) OpenDriveLab. All rights reserved.                                #
#---------------------------------------------------------------------------------#

"""
<V1.multiframe> of ViDAR future prediction head:
    * Predict future & history frames simultaneously.
"""

import copy
import torch
import torch.nn as nn
import numpy as np

from mmdet.models import HEADS, build_loss

from mmcv.runner import force_fp32, auto_fp16
from .vidar_head_base import ViDARHeadBase


@HEADS.register_module()
class ViDARHeadV1(ViDARHeadBase):
    def __init__(self,
                 history_queue_length,
                 pred_history_frame_num=0,
                 pred_future_frame_num=0,
                 per_frame_loss_weight=(1.0,),
                 use_nskg=False,
                 nskg_encoder_cfg=None,
                 nskg_enhancer_cfg=None,
                 use_nstp=False,  # 添加nSTP支持参数
                 nstp_encoder_cfg=None,  # 添加nSTP编码器配置
                 nstp_enhancer_cfg=None,  # 添加nSTP增强器配置
                 *args,
                 **kwargs):
        super().__init__(*args, **kwargs)

        self.history_queue_length = history_queue_length
        self.pred_history_frame_num = pred_history_frame_num
        self.pred_future_frame_num = pred_future_frame_num

        self.pred_frame_num = 1 + self.pred_history_frame_num + self.pred_future_frame_num
        self.per_frame_loss_weight = per_frame_loss_weight
        assert len(self.per_frame_loss_weight) == self.pred_frame_num

        self._init_bev_pred_layers()
        
        # nSKG支持
        self.use_nskg = use_nskg
        # nSTP支持
        self.use_nstp = use_nstp
        
        if self.use_nskg:
            from ..modules.nskg_gnn import NSKGEncoder
            from ..modules.nskg_bev_enhancer import NSKGBEVEnhancer
            
            # 创建nSKG编码器
            if nskg_encoder_cfg is not None:
                self.nskg_encoder = NSKGEncoder(**nskg_encoder_cfg)
            else:
                self.nskg_encoder = NSKGEncoder(
                    in_channels=8,
                    hidden_channels=64,
                    out_channels=256,
                    num_layers=2,
                    gnn_type='gat',
                    use_hetero=True
                )
            
            # 创建BEV特征增强器
            if nskg_enhancer_cfg is not None:
                self.nskg_enhancer = NSKGBEVEnhancer(**nskg_enhancer_cfg)
            else:
                self.nskg_enhancer = NSKGBEVEnhancer(
                    bev_channels=self.embed_dims,
                    nskg_channels=256,
                    hidden_channels=128,
                    bev_h=self.bev_h,
                    bev_w=self.bev_w,
                    use_attention=True
                )
        else:
            self.nskg_encoder = None
            self.nskg_enhancer = None
            
            # 添加nSTP支持
            if self.use_nstp:
                from ..modules.nstp_encoder import NSTPEncoder, NSTPEnhancer
                
                # 创建nSTP编码器
                if nstp_encoder_cfg is not None:
                    self.nstp_encoder = NSTPEncoder(**nstp_encoder_cfg)
                else:
                    self.nstp_encoder = NSTPEncoder(
                        in_channels=64,
                        hidden_channels=128,
                        out_channels=256,
                        num_layers=3,
                        gnn_type='graphsage',
                        dropout=0.1,
                        aggr='mean'
                    )
                
                # 创建nSTP增强器
                if nstp_enhancer_cfg is not None:
                    self.nstp_enhancer = NSTPEnhancer(**nstp_enhancer_cfg)
                else:
                    self.nstp_enhancer = NSTPEnhancer(
                        bev_channels=self.embed_dims,
                        nstp_channels=256,
                        hidden_channels=128,
                        bev_h=self.bev_h,
                        bev_w=self.bev_w,
                        use_attention=True
                    )
            else:
                self.nstp_encoder = None
                self.nstp_enhancer = None

    def forward(self, mlvl_feats, img_metas, prev_bev=None, **kwargs):
        """Forward function.
        Args:
            mlvl_feats (list(Tensor)): 多尺度特征，每个元素形状为 [B, num_cam, C, H, W]
            img_metas (list(dict)): 图像元信息
            prev_bev: 历史BEV特征
        Returns:
            tuple: bev_embed, history_states, future_states
        """
        # 调用父类的forward方法获取原始结果
        bev_embed, history_states, future_states = super().forward(
            mlvl_feats, img_metas, prev_bev, **kwargs)
        
        # 如果启用nSKG，处理图数据增强BEV特征
        if self.use_nskg and self.nskg_encoder is not None and self.nskg_enhancer is not None:
            bs = bev_embed.shape[0]
            bev_h, bev_w = self.bev_h, self.bev_w
            
            nskg_graphs = []
            for img_meta in img_metas:
                nskg_graph = img_meta.get('nskg_graph', None)
                nskg_graphs.append(nskg_graph)
            
            # 处理每个样本的nSKG数据
            enhanced_bevs = []
            for i in range(bs):
                # 获取当前样本的BEV特征
                curr_bev = bev_embed[i:i+1].view(1, bev_h, bev_w, -1).permute(0, 3, 1, 2)
                
                # 获取当前样本的nSKG图
                curr_graph = nskg_graphs[i] if i < len(nskg_graphs) and nskg_graphs[i] is not None else None
                
                if curr_graph is not None:
                    # 使用GNN编码器处理图数据
                    node_features, global_features = self.nskg_encoder(curr_graph)
                    
                    # 获取节点位置信息
                    if hasattr(curr_graph, 'pos'):
                        node_pos = curr_graph.pos
                    elif isinstance(curr_graph, dict) and 'pos' in curr_graph:
                        node_pos = curr_graph['pos']
                    else:
                        node_pos = None
                    
                    # 增强BEV特征
                    enhanced_bev = self.nskg_enhancer(
                        curr_bev, node_features, global_features, node_pos)
                        
                    enhanced_bevs.append(enhanced_bev)
                else:
                    # 如果没有nSKG数据，保持原始BEV特征不变
                    enhanced_bevs.append(curr_bev)
            
            # 合并增强后的BEV特征
            if enhanced_bevs:
                enhanced_bev = torch.cat(enhanced_bevs, dim=0)
                # 转回原始格式
                bev_embed = enhanced_bev.permute(0, 2, 3, 1).reshape(bs, bev_h * bev_w, -1)
        
        # 添加nSTP支持
        if self.use_nstp and self.nstp_encoder is not None and self.nstp_enhancer is not None:
            bs = bev_embed.shape[0]
            bev_h, bev_w = self.bev_h, self.bev_w
            
            nstp_graphs = []
            for img_meta in img_metas:
                nstp_graph = img_meta.get('nstp_graph', None)
                nstp_graphs.append(nstp_graph)
            
            # 处理每个样本的nSTP数据
            enhanced_bevs = []
            for i in range(bs):
                # 获取当前样本的BEV特征
                curr_bev = bev_embed[i:i+1].view(1, bev_h, bev_w, -1).permute(0, 3, 1, 2)
                
                # 获取当前样本的nSTP图
                curr_graph = nstp_graphs[i] if i < len(nstp_graphs) and nstp_graphs[i] is not None else None
                
                if curr_graph is not None:
                    # 使用GNN编码器处理图数据
                    node_features = self.nstp_encoder(curr_graph)
                    
                    # 获取节点位置信息（如果有）
                    node_pos = None
                    if hasattr(curr_graph, 'pos'):
                        node_pos = curr_graph.pos
                    elif isinstance(curr_graph, dict) and 'pos' in curr_graph:
                        node_pos = curr_graph['pos']
                    
                    # 增强BEV特征
                    enhanced_bev = self.nstp_enhancer(curr_bev, node_features, node_pos)
                    enhanced_bevs.append(enhanced_bev)
                else:
                    # 如果没有nSTP数据，保持原始BEV特征不变
                    enhanced_bevs.append(curr_bev)
            
            # 合并增强后的BEV特征
            if enhanced_bevs:
                enhanced_bev = torch.cat(enhanced_bevs, dim=0)
                # 转回原始格式
                bev_embed = enhanced_bev.permute(0, 2, 3, 1).reshape(bs, bev_h * bev_w, -1)
        
        return bev_embed, history_states, future_states

    def _init_bev_pred_layers(self):
        """Overwrite the {self.bev_pred_head} of super()._init_layers()
        """
        bev_pred_branch = []
        for _ in range(self.num_pred_fcs):
            bev_pred_branch.append(nn.Linear(self.embed_dims, self.embed_dims))
            bev_pred_branch.append(nn.LayerNorm(self.embed_dims))
            bev_pred_branch.append(nn.ReLU(inplace=True))
        bev_pred_branch.append(nn.Linear(
            self.embed_dims, self.pred_frame_num * self.num_pred_height))
        bev_pred_head = nn.Sequential(*bev_pred_branch)

        def _get_clones(module, N):
            return nn.ModuleList([copy.deepcopy(module) for i in range(N)])

        # Auxiliary supervision for all intermediate results.
        num_pred = self.transformer.decoder.num_layers
        self.bev_pred_head = _get_clones(bev_pred_head, num_pred)

    def forward_head(self, next_bev_feats):
        """Get freespace estimation from multi-frame BEV feature maps.

        Args:
            next_bev_feats (torch.Tensor): with shape as
                [pred_frame_num, inter_num, bs, bev_h * bev_w, dims]
                pred_frame_num: history frames + current frame + future frames.
        """
        next_bev_preds = []
        for lvl in range(next_bev_feats.shape[1]):
            # pred_frame_num, bs, bev_h * bev_w, num_height_pred * num_frame
            #  ===> pred_frame_num, bs, bev_h * bev_w, num_height_pred, num_frame
            #  ===> pred_frame_num, num_frame, bs, bev_h * bev_w, num_height_pred.
            next_bev_pred = self.bev_pred_head[lvl](next_bev_feats[:, lvl])
            next_bev_pred = next_bev_pred.view(
                *next_bev_pred.shape[:-1], self.num_pred_height, self.pred_frame_num)

            base_bev_pred = next_bev_pred[..., self.pred_history_frame_num][..., None]
            next_bev_pred = torch.cat([
                next_bev_pred[..., :self.pred_history_frame_num] + base_bev_pred,
                base_bev_pred,
                next_bev_pred[..., self.pred_history_frame_num + 1:] + base_bev_pred
            ], -1)

            next_bev_pred = next_bev_pred.permute(0, 4, 1, 2, 3).contiguous()
            next_bev_preds.append(next_bev_pred)
        # pred_frame_num, inter_num, num_frame, bs, bev_h*bev_w, num_height_pred
        next_bev_preds = torch.stack(next_bev_preds, 1)
        return next_bev_preds

    def _get_reference_gt_points(self,
                                 gt_points,
                                 src_frame_idx_list,
                                 tgt_frame_idx_list,
                                 img_metas):
        """Transform gt_points at src_frame_idx in {src_frame_idx_list} to the coordinate space
        of each tgt_frame_idx in {tgt_frame_idx_list}.
        """
        bs = len(gt_points)
        aligned_gt_points = []
        batched_origin_points = []
        for frame_idx, src_frame_idx, tgt_frame_idx in zip(
                range(len(src_frame_idx_list)), src_frame_idx_list, tgt_frame_idx_list):
            # 1. get gt_points belongs to src_frame_idx.
            src_frame_gt_points = [p[p[:, -1] == src_frame_idx] for p in gt_points]

            # 2. get transformation matrix..
            src_to_ref = [img_meta['total_cur2ref_lidar_transform'][src_frame_idx] for img_meta in img_metas]
            src_to_ref = gt_points[0].new_tensor(np.array(src_to_ref))  # bs, 4, 4
            ref_to_tgt = [img_meta['total_ref2cur_lidar_transform'][tgt_frame_idx] for img_meta in img_metas]
            ref_to_tgt = gt_points[0].new_tensor(np.array(ref_to_tgt))  # bs, 4, 4
            src_to_tgt = torch.matmul(src_to_ref, ref_to_tgt)

            # 3. transfer src_frame_gt_points to src_to_tgt.
            aligned_gt_points_per_frame = []
            for batch_idx, points in enumerate(src_frame_gt_points):
                new_points = points.clone()  # -1, 4
                new_points = torch.cat([
                    new_points[:, :3], new_points.new_ones(new_points.shape[0], 1)
                ], 1)
                new_points = torch.matmul(new_points, src_to_tgt[batch_idx])
                new_points[..., -1] = frame_idx
                aligned_gt_points_per_frame.append(new_points)
            aligned_gt_points.append(aligned_gt_points_per_frame)

            # 4. obtain the aligned origin points.
            aligned_origin_points = torch.from_numpy(
                np.zeros((bs, 1, 3))).to(src_to_tgt.dtype).to(src_to_tgt.device)
            aligned_origin_points = torch.cat([
                aligned_origin_points[..., :3], torch.ones_like(aligned_origin_points)[..., 0:1]
            ], -1)
            aligned_origin_points = torch.matmul(aligned_origin_points, src_to_tgt)
            batched_origin_points.append(aligned_origin_points[..., :3].contiguous())

        # stack points from different timestamps, and transfer to occupancy representation.
        batched_gt_points = []
        for b in range(bs):
            cur_gt_points = [
                aligned_gt_points[frame_idx][b]
                for frame_idx in range(len(src_frame_idx_list))]
            cur_gt_points = torch.cat(cur_gt_points, 0)
            batched_gt_points.append(cur_gt_points)

        batched_origin_points = torch.cat(batched_origin_points, 1)
        return batched_gt_points, batched_origin_points

    @force_fp32(apply_to=('pred_dict'))
    def loss(self,
             pred_dict,
             gt_points,
             start_idx,
             tgt_bev_h,
             tgt_bev_w,
             tgt_pc_range,
             pred_frame_num,
             img_metas=None,
             batched_origin_points=None):
        """"Compute loss for all history according to gt_points.

        gt_points: ground-truth point cloud in each frame.
            list of tensor with shape [-1, 5], indicating ground-truth point cloud in
            each frame.
        """
        bev_preds = pred_dict['next_bev_preds']
        valid_frames = np.array(pred_dict['valid_frames'])
        start_frames = (valid_frames + self.history_queue_length - self.pred_history_frame_num)
        tgt_frames = valid_frames + self.history_queue_length

        full_prev_bev_exists = pred_dict.get('full_prev_bev_exists', True)
        if not full_prev_bev_exists:
            frame_idx_for_loss = [self.pred_history_frame_num] * self.pred_frame_num
        else:
            frame_idx_for_loss = np.arange(0, self.pred_frame_num)

        loss_dict = dict()
        for idx, i in enumerate(frame_idx_for_loss):
            # 1. get the predicted occupancy of frame-i.
            cur_bev_preds = bev_preds[:, :, i, ...].contiguous()

            # 2. get the frame index of current frame.
            src_frames = start_frames + i

            # 3. get gt_points belonging to cur_valid_frames.
            cur_gt_points, cur_origin_points = self._get_reference_gt_points(
                gt_points,
                src_frame_idx_list=src_frames,
                tgt_frame_idx_list=tgt_frames,
                img_metas=img_metas)

            # 4. compute loss.
            if i != self.pred_history_frame_num:
                # For aux history-future supervision:
                #  only compute loss for cur_frame prediction.
                loss_weight = np.array([[1]] + [[0]] * (len(self.loss_weight) - 1))
            else:
                loss_weight = self.loss_weight

            cur_loss_dict = super().loss(
                dict(next_bev_preds=cur_bev_preds,
                     valid_frames=np.arange(0, len(src_frames))),
                cur_gt_points,
                start_idx=start_idx,
                tgt_bev_h=tgt_bev_h,
                tgt_bev_w=tgt_bev_w,
                tgt_pc_range=tgt_pc_range,
                pred_frame_num=len(self.loss_weight)-1,
                img_metas=img_metas,
                batched_origin_points=cur_origin_points,
                loss_weight=loss_weight)

            # 5. merge dict.
            cur_frame_loss_weight = self.per_frame_loss_weight[i]
            cur_frame_loss_weight = cur_frame_loss_weight * (idx == i)
            for k, v in cur_loss_dict.items():
                loss_dict.update({f'frame.{idx}.{k}.loss': v * cur_frame_loss_weight})
        return loss_dict

    @force_fp32(apply_to=('pred_dict'))
    def get_point_cloud_prediction(self,
                                   pred_dict,
                                   gt_points,
                                   start_idx,
                                   tgt_bev_h,
                                   tgt_bev_w,
                                   tgt_pc_range,
                                   img_metas=None,
                                   batched_origin_points=None):
        """"Generate point cloud prediction.
        """
        # pred_frame_num, inter_num, num_frame, bs, bev_h * bev_w, num_height_pred
        pred_dict['next_bev_preds'] = pred_dict['next_bev_preds'][:, :, self.pred_history_frame_num, ...].contiguous()

        valid_frames = np.array(pred_dict['valid_frames'])
        valid_gt_points, cur_origin_points = self._get_reference_gt_points(
            gt_points,
            src_frame_idx_list=valid_frames + self.history_queue_length,
            tgt_frame_idx_list=valid_frames + self.history_queue_length,
            img_metas=img_metas)
        return super().get_point_cloud_prediction(
            pred_dict=pred_dict,
            gt_points=valid_gt_points,
            start_idx=start_idx,
            tgt_bev_h=tgt_bev_h,
            tgt_bev_w=tgt_bev_w,
            tgt_pc_range=tgt_pc_range,
            img_metas=img_metas,
            batched_origin_points=cur_origin_points)

2.3 ViDAR检测器修改

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\detectors\vidar.py

主要修改:

修改了forward_train方法：处理nSTP特征，并解决了元组类型问题
修改了forward_test方法：支持测试时使用nSTP特征

关键修改部分：

def forward_train(self, **kwargs):
    # ...现有代码...
    
    # 修改部分：处理next_bev_feats中可能的元组类型
    processed_next_bev_feats = []
    for feat in next_bev_feats:
        if isinstance(feat, tuple):
            # 如果是元组，取第一个元素（主要特征）
            processed_next_bev_feats.append(feat[0])
        else:
            processed_next_bev_feats.append(feat)
    
    next_bev_feats = torch.stack(processed_next_bev_feats, 0)
    
    # ...继续现有代码...

3. 配置文件修改

3.1 OpenScene配置

文件路径: ViDAR\projects\configs\vidar_pretrain\OpenScene\vidar_OpenScene_mini_1_8_3future_nstp.py

主要修改:

添加了nSTP相关配置：启用nSTP，设置数据路径
修改了数据处理流程，添加了nSTP数据处理组件
配置了nSTP编码器和增强器参数

# nSTP配置
use_nskg = False  # 禁用nSKG
use_nstp = True  # 启用nSTP
nstp_path = 'data/nuscenes/nstp/train/raw'  # nSTP数据目录路径

# 添加nSTP数据处理组件
train_pipeline.insert(-2, dict(type='ProcessNSTPGraph'))

# 修改数据集配置
data = dict(
    # ...
    train=dict(
        # ...
        use_nstp=use_nstp,
        nstp_path=nstp_path,
        # ...
    ),
    # ...
)

3.2 NuScenes全集配置

文件路径: ViDAR\projects\configs\vidar_pretrain\nusc_fullset\vidar_nstp_nusc.py

主要修改:

基于基础配置，添加了nSTP支持
配置了nSTP数据路径和处理逻辑

_base_ = ['./vidar_full_nusc_1future.py']

# nSTP配置
use_nskg = False
use_nstp = True
nstp_path = 'data/nuscenes/nstp/nstp.pkl'  # nSTP数据路径

# 修改数据集配置
data = dict(
    # ...
    train=dict(
        type='NuScenesViDARDatasetV1',
        use_nskg=use_nskg,
        use_nstp=use_nstp,
        nstp_path=nstp_path,
        # ...
    ),
    # ...
)

4. 其他辅助修改

4.1 数据集注册

文件路径: d:\git_clone\ViDAR\projects\mmdet3d_plugin\datasets\__init__.py

主要修改:

导入并注册了nSTP相关模块：NSTPEncoder, NSTPEnhancer

from .nstp_encoder import NSTPEncoder, NSTPEnhancer

__all__ = [
    # ...
    'NSTPEncoder',
    'NSTPEnhancer',
    # ...
]

5. 过程中的好多错误

bb0c0e4dfe0861b863c951950326904

首先是数据集nSKG不能用，用了会出现这个问题：

c5f86c19470657dbb32886c2a1ecc09

592bd326d2dfd419178041902a18105

然后进而导致：

baa821cdf2653096e4ff3d91e0adb78

46aa267f5409cd333b793036e91f475

然后对他做细致处理的话，其实也可以，但是我写的代码处理不了：

919ca81a4064679c687412f83fcacbf

所以最后选择使用nSTP，因为在nuScenes Knowledge Graph发现了nSTP是对nSKG的拓展，而且可以直接拿来训练，因此修改代码适配：

94c305fa70e234b826ba190de87e104

最后结果，可以正常读取nSTP数据文件：

f4d05a563eb5ef922c95d96436bf1d2

但是……爆内存了：

64c1d3c75e8a169d3bffaf3d1b7f692

写到这里的时候刚修复了一小点bug，目前仍然在服务器上跑着……

然后最后贴一张饱受折损的服务器合照（感谢罗勇老师）

926d4f6ee558907674f7927728597bc

b83a187f7c801ef796584bdd84e832b

6. 最后总结

nSTP集成工作主要包括以下几个方面：

数据处理：创建了nSTP图数据的加载和处理逻辑，支持从.pt文件中读取图结构数据
特征提取：实现了基于图神经网络的nSTP编码器，提取图结构中的时空特征
特征融合：实现了nSTP特征与BEV特征的融合机制，通过注意力机制增强BEV特征
模型集成：将nSTP模块集成到ViDAR模型中，修改了前向传播逻辑
配置支持：添加了nSTP相关配置，支持灵活开启/关闭nSTP功能

这些修改使ViDAR模型能够利用nSTP提供的场景结构和时间演化信息，增强了模型对动态场景的理解能力，特别是在预测未来帧方面。

然后我们完成的工作：

完成环境配置，解决冲突依赖问题
完成数据集的读取与训练问题
完成对vidar的修改以加入nSTP数据集来调优
修复原本的pytouch问题
目前仍然在服务器上跑着，估计还有不少后续的训练问题需要修改……但是没时间了