前言

这个是针对于知识工程的学习与进一步实践,个人感觉难度挺高的,前后花费了大概有三周的时间,第一周主要是解决依赖的各种报错问题,第二周主要用在数据集的裁切和平台迁移上(这个主要受制于gpu的内存不够),第三周主要用在调优思路的探索和实践上,花了这么长时间感觉还是跟学校的课程安排有关,以及现在已经快接近五月了,保研人应该都懂……各种夏令营的事情和课程大作业搞得有点晕头撞向的,因此只能尽自己最大努力利用时间来完成这个课程实践,最后嘛还是有许多遗憾,但只能止步于此了……

过程

使用model art平台

模型链接:

https://github.com/OpenDriveLab/ViDAR

前置:

配置model art镜像

fc91c387a9f5fa96a87e8662028e0c8

image-20250401153203321

image-20250401153223717

image-20250401153259882

image-20250401153426317

通过以上步骤进行镜像配置,就不过多赘述了,当时的解释md文件被我删除了……

遇到的问题以及解决方法

  1. 依赖报错问题:主要集中在numpy的版本上,因为model art本身要求的numpy版本较高,但是ViDAR又需要较低版本导致冲突,后面配置了一个脚本用于解决大部分问题:

    vim install_deps.sh

    然后:

    #!/bin/bash

    # 设置清华源
    PIP_SOURCE="-i https://pypi.tuna.tsinghua.edu.cn/simple --trusted-host pypi.tuna.tsinghua.edu.cn"

    # 临时移除 ModelArts SDK 以避免干扰
    export ORIGINAL_PYTHONPATH=$PYTHONPATH
    export PYTHONPATH=$(echo $PYTHONPATH | sed 's|/home/ma-user/modelarts-dev/modelarts-sdk||g')

    # 打印环境信息
    echo "Checking Python and PyTorch versions..."
    python -c "import sys, torch; print('Python:', sys.version); print('PyTorch:', torch.__version__, 'CUDA:', torch.cuda.is_available())"

    # 步骤 1:修复依赖冲突
    echo "Fixing dependency conflicts..."
    pip install numpy==1.23.5 --force-reinstall $PIP_SOURCE
    pip install networkx==2.2 --force-reinstall $PIP_SOURCE
    pip install pyasn1==0.6.1 --force-reinstall $PIP_SOURCE
    pip install pandas==1.2.5 --force-reinstall $PIP_SOURCE # 确保 pandas 版本

    # 步骤 2:处理“平台不支持”包,降级到兼容版本
    echo "Fixing platform compatibility issues..."
    pip install mmengine
    pip install PyYAML==6.0 charset-normalizer==3.3.2 fonttools==4.38.0 kiwisolver==1.4.5 \
    lxml==4.9.3 matplotlib==3.5.2 simplejson==3.19.2 MarkupSafe==2.1.5 \
    cffi==1.16.0 greenlet==3.0.3 ijson==3.2.4 SQLAlchemy==2.0.30 \
    --force-reinstall $PIP_SOURCE

    # 步骤 3:安装 mmcv-full==1.4.0
    echo "Installing mmcv-full==1.4.0..."
    pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html $PIP_SOURCE

    # 步骤 4:安装 mmdet3d 剩余依赖
    echo "Installing mmdet3d dependencies..."
    pip install lyft_dataset_sdk nuscenes-devkit plyfile tensorboard numba==0.48.0 scikit-image==0.19.3 $PIP_SOURCE
    pip install numpy==1.23.5 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install mmcv-full==1.4.0 -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.10.0/index.html -i https://pypi.tuna.tsinghua.edu.cn/simple

    # 步骤 5:安装 mmdet==2.14.0 和 mmsegmentation==0.14.1
    echo "Installing mmdet==2.14.0 and mmsegmentation==0.14.1..."
    pip install mmdet==2.14.0 $PIP_SOURCE
    pip install mmsegmentation==0.14.1 $PIP_SOURCE

    git clone https://github.com/open-mmlab/mmdetection3d.git
    cd mmdetection3d
    git checkout v0.17.1 # Other versions may not be compatible.
    python setup.py install
    cd ..

    # 步骤 6:安装 detectron2 和其他依赖
    echo "Installing detectron2 and other dependencies..."
    pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13 typing-extensions==4.5.0 \
    pylint ipython==8.12 matplotlib==3.5.2 numba==0.48.0 setuptools==59.5.0 $PIP_SOURCE
    python -m pip install 'git+https://github.com/facebookresearch/detectron2.git' $PIP_SOURCE


    # 步骤 7:安装 ViDAR 和 chamferdistance
    echo "Installing ViDAR and chamferdistance..."
    if [ ! -d "ViDAR" ]; then
    git clone https://github.com/OpenDriveLab/ViDAR
    fi
    cd ViDAR
    mkdir -p pretrained
    cd pretrained
    wget https://github.com/zhiqi-li/storage/releases/download/v1.0/r101_dcn_fcos3d_pretrain.pth || echo "Pretrained model download failed, continuing..."
    cd ../third_lib/chamfer_dist/chamferdist/
    pip install . $PIP_SOURCE
    cd ../../..
    pip install matplotlib==3.5.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install pyparsing==2.4.7 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install kiwisolver==1.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install --user prettytable==3.7.0

    # 提示完成
    echo "Installation complete. If errors occurred, check logs above."

    # 可选:隔离环境(注释掉,需手动启用
    # echo "If conflicts persist, consider creating a clean environment:"
    # echo "conda create -n vidar_clean python=3.8"
    # echo "conda activate vidar_clean"
    # echo "Then rerun this script."

    然后就是喜闻乐见的一键解决问题了

    chmod +x install_deps.sh
    ./install_deps.sh

    然后有两个地方需要修改:

    微信图片_2025-04-15_210139_539

    微信图片_2025-04-15_210556_226

    之后到达vidar目录下:

    CONFIG=ViDAR/projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_train_1_8_3future.py GPU_NUM=1

    CONFIG=projects/configs/vidar_pretrain/OpenScene/vidar_OpenScene_mini_1_8_3future.py

    GPU_NUM=1

    ./tools/dist_train.sh ${CONFIG} ${GPU_NUM}
  2. 数据集

    使用openxlab软件包。注:最高版本openxlab需要python≥3.8,因此需要创建一个虚拟环境安装

    conda create -n openxlab python=3.9
    pip install openxlab
    openxlab login # 需要创建openxlab账号之后,创建access key再在这里登录
    #我的AK/SK放在代码块外面
    openxlab dataset download --dataset-repo OpenDriveLab/OpenScene --source-path /openscene-v1.1/openscene_sensor_mini_camera.tgz --target-path .
    openxlab dataset download --dataset-repo OpenDriveLab/OpenScene --source-path /openscene-v1.1/openscene_sensor_mini_lidar.tgz --target-path .

    Access Key: wgakjbrzyyxljprb1b2z Secret Key: rnyq568lwdpayblrb744qdmxyg4xz19vo3b0azog

    然后用上面的指令解压。大概占据硬盘空间170GB左右,因为要解压所以硬盘建议大概250-300GB

    数据集解压后自动移动 MergedPointCloud 到目标路径:

    可以运行脚本vim fix_mergedpointcloud.py

    import os
    import shutil

    # 根目录路径(根据你实际目录修改)
    bad_root = "ViDAR/data/openscene_v1.1/OpenDriveLab___OpenScene/openscene-v1.1/openscene_v1.1/sensor_blobs/mini"
    correct_root = "ViDAR/data/openscene_v1.1/sensor_blobs/mini"

    # 遍历所有 subdir
    for subdir in os.listdir(bad_root):
    full_bad_path = os.path.join(bad_root, subdir, "MergedPointCloud")
    full_target_dir = os.path.join(correct_root, subdir)

    if os.path.exists(full_bad_path):
    target_path = os.path.join(full_target_dir, "MergedPointCloud")

    print(f"Moving {full_bad_path} --> {target_path}")

    os.makedirs(full_target_dir, exist_ok=True)
    if os.path.exists(target_path):
    print(f" - Skipping {target_path} (already exists)")
    else:
    shutil.move(full_bad_path, target_path)

    执行:python fix_mergedpointcloud.py

    或者创建软连接

    import os

    bad_root = "ViDAR/data/openscene_v1.1/OpenDriveLab___OpenScene/openscene-v1.1/openscene_v1.1/sensor_blobs/mini"
    correct_root = "ViDAR/data/openscene_v1.1/sensor_blobs/mini"

    for sequence in os.listdir(bad_root):
    bad_mp = os.path.join(bad_root, sequence, "MergedPointCloud")
    correct_target_dir = os.path.join(correct_root, sequence)
    correct_link_path = os.path.join(correct_target_dir, "MergedPointCloud")

    if os.path.exists(bad_mp):
    if not os.path.exists(correct_target_dir):
    print(f"路径不存在,创建中:{correct_target_dir}")
    os.makedirs(correct_target_dir)

    if not os.path.exists(correct_link_path):
    print(f"创建软链接:{correct_link_path} -> {bad_mp}")
    os.symlink(os.path.abspath(bad_mp), correct_link_path)
    else:
    print(f"已存在:{correct_link_path}")

    openscene_metadata_mini.tgz可以下载本地再直接上传

    最后: python tools/collect_nuplan_data.py mini

  3. 训练的一些报错

    ninja报错

    RuntimeError: Ninja is required to load C++

    解决:从源码构建并本地安装

    1. 下载和构建 Ninja:

      • 克隆 Ninja 官方仓库:

        bash
        git clone <https://github.com/ninja-build/ninja.git>
        cd ninja
        python configure.py --bootstrap
      • 这会生成 ninja 二进制文件。

    2. 创建本地目录并移动文件:

      • 创建 ~/bin 目录(如果不存在):

        bash
        mkdir -p ~/bin
      • ninja 二进制文件移动到 ~/bin

        bash
        mv ninja ~/bin/
    3. 更新 PATH 环境变量:

      • 临时添加至当前会话:

        bash
        export PATH=~/bin:$PATH
    4. 验证安装

      • 运行 ninja --version 检查是否成功。

    crypt.h报错

    解决:

    • 步骤 1:确认 glibc-2.27 源码

      • 确保你已从 GNU FTP 服务器 下载并解压了 glibc-2.27.tar.xz 文件。如果未下载,使用以下命令:

        ruby
        wget <https://ftp.gnu.org/gnu/glibc/glibc-2.27.tar.xz>
        tar -xJf glibc-2.27.tar.xz
      • 确认解压后生成了 glibc-2.27 目录,并包含 include 子目录。

      步骤 2:复制所有头文件

      • 将 glibc-2.27/include 目录下的所有头文件复制到你的本地 ~/include 目录,以确保所有依赖头文件(如 features.hstdint.h 等)都可用:

        bash
        mkdir -p ~/include
        cp -r glibc-2.27/include/* ~/include/
      • 这将复制包括 crypt.h 在内的所有头文件到 ~/include,确保编译器能找到所有必需的依赖。

      步骤 3:设置包含路径

      • 设置 CPLUS_INCLUDE_PATH 环境变量,确保编译器优先搜索 ~/include 目录:

        bash
        export CPLUS_INCLUDE_PATH=~/include:$CPLUS_INCLUDE_PATH
      • 验证环境变量设置:

        bash
        echo $CPLUS_INCLUDE_PATH

        应包含 ~/include

      步骤 4:检查系统头文件

      首先,检查你的环境中是否已有必要的头文件:

      • 运行

        ls /usr/include/crypt.h

        查看是否已有

        crypt.h

        。如果存在,尝试使用系统编译器:

        • 运行 export CC=/usr/bin/gccexport CXX=/usr/bin/g++
        • 然后取消设置 CPLUS_INCLUDE_PATHunset CPLUS_INCLUDE_PATH
        • 重新运行训练脚本:./tools/dist_train.sh ${CONFIG} ${GPU_NUM}

    GLIBCXX_3.4.29报错

    参考:

    [如何解决version `GLIBCXX_3.4.29‘ not found的问题_glibcxx not found-CSDN博客](https://blog.csdn.net/weixin_39379635/article/details/129159713)

    解决:

    ImportError: /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6: version `GLIBCXX_3.4.29' not found

    1、使用指令先看下系统目前都有哪些版本的

    strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6 | grep GLIBCXX

    发现只到3.4.22

    2、来查看当前系统中其它的同类型文件,

    sudo find / -name "libstdc++.so.6*"

    找到一个版本比较高的 /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29

    查看

    strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 | grep GLIBCXX

    有了3.4.29

    3、复制到指定目录并建立新的链接

    复制
    sudo cp /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 /home/ma-user/anaconda3/envs/vidar/lib/

    删除之前链接
    sudo rm /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6

    创建新的链接
    sudo ln -s /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6.0.29 /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6

    验证

    strings /home/ma-user/anaconda3/envs/vidar/lib/libstdc++.so.6 | grep GLIBCXX

    有了3.4.29

    另:如果是/usr/lib/x86_64-linux-gnu/libstdc++.so.6报错,使用:

    export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

    GPU爆内存

    cuda out of memory

    只训练mini数据集的一部分,注意留的数据meta_datas里的文件夹名字和sensor_blobs里文件夹名对应

    fsspec 与 Python 3.8 兼容性问题

    TypeError: ‘type’ object is not subscriptable

    解决方法:降低到fsspec可以兼容3.8的版本

    pip install fsspec==2025.3.0

ViDAR模型实现分析

模型架构概述

ViDAR(Visual Point Cloud Forecasting enables Scalable Autonomous Driving)是一个基于BEVFormer架构的模型,专注于自动驾驶场景中的视觉点云预测。从 vidar_transformer.py 文件可以看出,它主要实现了一个预测变换器(PredictionTransformer)。

核心组件

  1. PredictionTransformer :
    • 这是ViDAR的核心组件,用于从多帧BEV特征预测下一帧的BEV特征
    • 使用了自定义的解码器来处理时序信息
  2. 注意力机制 :
    • 时间自注意力(TemporalSelfAttention):处理时间维度上的信息
    • 空间交叉注意力(MSDeformableAttention3D):处理3D空间中的信息
    • 自定义可变形注意力(CustomMSDeformableAttention):用于处理特征对齐

项目结构

项目结构显示ViDAR是基于BEVFormer进行扩展的:

projects/
├── configs/
│ ├── base/
│ ├── bevformer/
│ ├── vidar_finetune/ # ViDAR微调配置
│ └── vidar_pretrain/ # ViDAR预训练配置
└── mmdet3d_plugin/
├── bevformer/ # BEVFormer相关模块
├── core/ # 核心评估和功能模块
├── datasets/ # 数据集处理
├── dd3d/ # 3D检测相关模块
└── models/ # 模型定义

与BEVFormer的关系

ViDAR似乎是在BEVFormer基础上的扩展,专注于未来帧预测:

  • BEVFormer主要关注多视角图像到BEV表示的转换
  • ViDAR则进一步关注BEV表示的时序预测,实现对未来场景的预测

调优思路

融合Nskg

写在前面:

这个项目其实没有想象的复杂,大概如下:

ViDAR模型似乎是一个基于BEVFormer架构的3D检测/分割模型,它利用MMDetection3D框架实现,支持多视角图像到BEV表示的转换。该模型具有灵活的配置系统、插件扩展能力和完善的训练功能。

那么有了以上信息就好做了,首先学习下MMDetection3D的使用方法,Vidar只不过在上面多封装了一层,那么这一层应该也可以写插件进行拓展

核心目录:

  1. 模型定义文件(可能在 projects/mmdet3d_plugin/bevformer/ 目录下)
  2. 配置文件(通过命令行参数 config 指定)
  3. 自定义训练函数 custom_train_model 的实现
  • 论文1:nuScenes Knowledge Graph (nSKG)

    nuScenes Knowledge Graph (nSKG) 文章的内容可以很好地融入对 ViDAR: Visual Point Cloud Forecasting 模型的理解和调优中,尤其是在以下几个方面:

    1. 丰富场景表示:nSKG 提供了 nuScenes 数据集的综合语义表示,包含交通场景中的实体(如车辆、行人、车道、交通信号灯)和它们之间的语义与空间关系。这可以增强 ViDAR 的输入数据,改善其点云预测和下游任务(如感知、规划)的性能。
    2. 数据处理改进:nSKG 的结构化数据(以知识图谱和 PyTorch Geometric 格式提供)可以直接用于 ViDAR 的数据管道,减少数据预处理的工程负担。
    3. 模型架构增强:nSKG 的异构图表示可以与 ViDAR 的 Transformer 或 BEVFormer 模块结合,引入图神经网络(GNN)来处理语义关系,提升预测的鲁棒性和可解释性。
    4. 调优方向:利用 nSKG 的丰富上下文(如车道拓扑、代理关系),可以优化 ViDAR 的超参数、数据增强策略和损失函数,特别是在处理复杂交通场景时。

实际修改:

e879ac0c92d040202e3ba7450e5459b

ViDAR项目集成nSTP的工作总结

根据当前代码库,我将总结从初始状态到现在为止,为了集成nSTP(Neural Scene-Time Priors)所做的工作。

1. 核心文件创建

1.1 nSTP编码器模块

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\modules\nstp_encoder.py

主要功能:

  • 创建了NSTPEncoder类:使用图神经网络(GraphSAGE或GAT)处理nSTP图数据
  • 创建了NSTPEnhancer类:将nSTP特征与BEV特征融合,通过注意力机制增强BEV特征
import torch
import torch.nn as nn
from torch_geometric.nn import GraphSAGE, GATConv

class NSTPEncoder(nn.Module):
"""nSTP图数据编码器"""

def __init__(self, in_channels, hidden_channels, out_channels, num_layers=3,
gnn_type='graphsage', dropout=0.1, aggr='mean'):
super().__init__()
self.in_channels = in_channels
self.hidden_channels = hidden_channels
self.out_channels = out_channels
self.num_layers = num_layers
self.gnn_type = gnn_type

# 输入特征投影
self.input_proj = nn.Linear(in_channels, hidden_channels)

# 图神经网络
if gnn_type == 'graphsage':
self.gnn = GraphSAGE(
in_channels=hidden_channels,
hidden_channels=hidden_channels,
num_layers=num_layers,
out_channels=out_channels,
dropout=dropout,
aggr=aggr
)
elif gnn_type == 'gat':
# 简化版GAT实现
self.gnn_layers = nn.ModuleList()
self.gnn_layers.append(GATConv(hidden_channels, hidden_channels))
for _ in range(num_layers - 2):
self.gnn_layers.append(GATConv(hidden_channels, hidden_channels))
self.gnn_layers.append(GATConv(hidden_channels, out_channels))
self.dropout = nn.Dropout(dropout)
else:
raise ValueError(f"不支持的GNN类型: {gnn_type}")

def forward(self, data):
"""前向传播

Args:
data: PyG Data对象或包含x和edge_index的字典

Returns:
torch.Tensor: 节点特征
"""
# 添加对None的处理
if data is None:
# 返回一个空的特征张量
return torch.zeros((1, self.out_channels), device=self.input_proj.weight.device)

if hasattr(data, 'x') and hasattr(data, 'edge_index'):
x, edge_index = data.x, data.edge_index
elif isinstance(data, dict) and 'x' in data and 'edge_index' in data:
x, edge_index = data['x'], data['edge_index']
else:
print(f"警告: 输入数据格式不正确: {type(data)}")
# 返回一个空的特征张量
return torch.zeros((1, self.out_channels), device=self.input_proj.weight.device)

# 确保x和edge_index是张量
if not isinstance(x, torch.Tensor):
x = torch.tensor(x, dtype=torch.float, device=self.input_proj.weight.device)
if not isinstance(edge_index, torch.Tensor):
edge_index = torch.tensor(edge_index, dtype=torch.long, device=self.input_proj.weight.device)

# 特征投影
x = self.input_proj(x)

# 图神经网络处理
if self.gnn_type == 'graphsage':
x = self.gnn(x, edge_index)
else: # gat
for i, layer in enumerate(self.gnn_layers):
if i < len(self.gnn_layers) - 1:
x = layer(x, edge_index)
x = torch.relu(x)
x = self.dropout(x)
else:
x = layer(x, edge_index)

return x


class NSTPEnhancer(nn.Module):
"""nSTP特征增强器,用于增强BEV特征"""

def __init__(self, bev_channels, nstp_channels, hidden_channels, bev_h, bev_w, use_attention=True):
super().__init__()
self.bev_channels = bev_channels
self.nstp_channels = nstp_channels
self.hidden_channels = hidden_channels
self.bev_h = bev_h
self.bev_w = bev_w
self.use_attention = use_attention

# 特征融合层
self.nstp_proj = nn.Linear(nstp_channels, hidden_channels)
self.bev_proj = nn.Linear(bev_channels, hidden_channels)

if use_attention:
# 注意力机制
self.query_proj = nn.Linear(hidden_channels, hidden_channels)
self.key_proj = nn.Linear(hidden_channels, hidden_channels)
self.value_proj = nn.Linear(hidden_channels, hidden_channels)
self.attention_scale = hidden_channels ** -0.5

# 输出投影
self.output_proj = nn.Linear(hidden_channels, bev_channels)

def forward(self, bev_feat, nstp_feat, nstp_pos=None):
"""前向传播

Args:
bev_feat (torch.Tensor): BEV特征 [B, C, H, W]
nstp_feat (torch.Tensor): nSTP节点特征 [B, N, C]
nstp_pos (torch.Tensor, optional): nSTP节点位置 [B, N, 2]

Returns:
torch.Tensor: 增强后的BEV特征 [B, C, H, W]
"""
B, C, H, W = bev_feat.shape
bev_feat_flat = bev_feat.flatten(2).permute(0, 2, 1) # [B, H*W, C]

# 特征投影
bev_feat_proj = self.bev_proj(bev_feat_flat) # [B, H*W, hidden]
nstp_feat_proj = self.nstp_proj(nstp_feat) # [B, N, hidden]

if self.use_attention:
# 计算注意力
query = self.query_proj(bev_feat_proj) # [B, H*W, hidden]
key = self.key_proj(nstp_feat_proj) # [B, N, hidden]
value = self.value_proj(nstp_feat_proj) # [B, N, hidden]

# 注意力分数
attn = torch.bmm(query, key.transpose(1, 2)) * self.attention_scale # [B, H*W, N]
attn = torch.softmax(attn, dim=-1)

# 加权特征
context = torch.bmm(attn, value) # [B, H*W, hidden]

# 融合特征
enhanced_feat = context + bev_feat_proj
else:
# 简单平均
nstp_feat_expanded = nstp_feat_proj.mean(dim=1, keepdim=True).expand(-1, H*W, -1)
enhanced_feat = bev_feat_proj + nstp_feat_expanded

# 输出投影
enhanced_feat = self.output_proj(enhanced_feat) # [B, H*W, C]

# 重塑为BEV特征
enhanced_feat = enhanced_feat.permute(0, 2, 1).reshape(B, C, H, W)

return enhanced_feat

1.2 nSTP数据处理组件

文件路径: ViDAR\projects\mmdet3d_plugin\datasets\pipelines\nstp_transform.py

主要功能:

  • 创建了ProcessNSTPGraph类:处理nSTP图数据,确保格式正确并转换为PyTorch张量
import torch
import numpy as np
from mmdet.datasets.builder import PIPELINES

@PIPELINES.register_module()
class ProcessNSTPGraph:
"""处理nSTP图数据的转换组件"""

def __init__(self, graph_feat_dim=64, with_agent_type=True):
self.graph_feat_dim = graph_feat_dim
self.with_agent_type = with_agent_type

def __call__(self, results):
"""处理nSTP图数据"""
if 'nstp_graph' not in results:
return results

graph_data = results['nstp_graph']

# 处理PyG Data对象
if hasattr(graph_data, 'x') and hasattr(graph_data, 'edge_index'):
# 已经是PyG Data对象,确保张量类型正确
if not isinstance(graph_data.x, torch.Tensor):
graph_data.x = torch.tensor(graph_data.x, dtype=torch.float)
if not isinstance(graph_data.edge_index, torch.Tensor):
graph_data.edge_index = torch.tensor(graph_data.edge_index, dtype=torch.long)
if hasattr(graph_data, 'edge_attr') and not isinstance(graph_data.edge_attr, torch.Tensor):
graph_data.edge_attr = torch.tensor(graph_data.edge_attr, dtype=torch.float)

# 处理字典格式的图数据
elif isinstance(graph_data, dict):
# 处理节点特征
if 'x' in graph_data:
x = graph_data['x']
if isinstance(x, np.ndarray):
x = torch.from_numpy(x).float()
elif isinstance(x, list):
x = torch.tensor(x).float()
elif not isinstance(x, torch.Tensor):
x = torch.tensor(x, dtype=torch.float)
graph_data['x'] = x

# 处理边索引
if 'edge_index' in graph_data:
edge_index = graph_data['edge_index']
if isinstance(edge_index, np.ndarray):
edge_index = torch.from_numpy(edge_index).long()
elif isinstance(edge_index, list):
edge_index = torch.tensor(edge_index).long()
elif not isinstance(edge_index, torch.Tensor):
edge_index = torch.tensor(edge_index, dtype=torch.long)
graph_data['edge_index'] = edge_index

# 处理边属性
if 'edge_attr' in graph_data:
edge_attr = graph_data['edge_attr']
if isinstance(edge_attr, np.ndarray):
edge_attr = torch.from_numpy(edge_attr).float()
elif isinstance(edge_attr, list):
edge_attr = torch.tensor(edge_attr).float()
elif not isinstance(edge_attr, torch.Tensor):
edge_attr = torch.tensor(edge_attr, dtype=torch.float)
graph_data['edge_attr'] = edge_attr

# 更新结果
results['nstp_graph'] = graph_data
return results

2. 现有文件修改

2.1 数据集类修改

2.1.1 NuScenes数据集

文件路径: ViDAR\projects\mmdet3d_plugin\datasets\nuscenes_vidar_dataset_v1.py

主要修改:

  • 添加了nSTP相关参数:use_nstp, nstp_path
  • 实现了_load_nstp_data方法:加载nSTP图数据文件
  • 修改了get_data_info方法:将nSTP数据添加到样本信息中
#---------------------------------------------------------------------------------#
# Visual Point Cloud Forecasting enables Scalable Autonomous Driving #
# Copyright (c) OpenDriveLab. All rights reserved. #
#---------------------------------------------------------------------------------#

import copy
import torch
import numpy as np
import os

# 导入rdflib
try:
from rdflib import Graph
except ImportError:
print("警告: 未安装rdflib库,无法加载.ttl格式的nSKG数据")

from mmdet.datasets import DATASETS
from nuscenes.eval.common.utils import quaternion_yaw, Quaternion
from nuscenes.utils.geometry_utils import transform_matrix
from mmcv.parallel import DataContainer as DC

from .nuscenes_vidar_dataset_template import NuScenesViDARDatasetTemplate


@DATASETS.register_module()
class NuScenesViDARDatasetV1(NuScenesViDARDatasetTemplate): # 确保类名为NuScenesViDARDatasetV1
"""NuScenes visual point cloud forecasting dataset.
"""
def __init__(self,
ann_file,
pipeline=None,
data_root=None,
classes=None,
load_interval=1,
modality=None,
box_type_3d='LiDAR',
filter_empty_gt=True,
test_mode=False,
use_valid_flag=False,
history_queue_length=None,
pred_history_frame_num=0,
pred_future_frame_num=0,
per_frame_loss_weight=(1.0,),
use_nskg=False,
nskg_path=None,
nskg_ontology_path=None,
use_nstp=False, # 添加nSTP支持参数
nstp_path=None, # 添加nSTP数据路径参数
**kwargs):
# 保存history_queue_length参数,但不传递给父类
self.history_queue_length = history_queue_length

# 调用父类初始化方法,移除history_queue_length参数
super().__init__(
ann_file=ann_file,
pipeline=pipeline,
data_root=data_root,
classes=classes,
load_interval=load_interval,
modality=modality,
box_type_3d=box_type_3d,
filter_empty_gt=filter_empty_gt,
test_mode=test_mode,
use_valid_flag=use_valid_flag,
**kwargs)

# 保存nSKG相关参数
self.use_nskg = use_nskg
self.nskg_path = nskg_path
self.nskg_ontology_path = nskg_ontology_path

# 保存nSTP相关参数
self.use_nstp = use_nstp
self.nstp_path = nstp_path

# 保存预测帧数相关参数
self.pred_history_frame_num = pred_history_frame_num
self.pred_future_frame_num = pred_future_frame_num
self.per_frame_loss_weight = per_frame_loss_weight

# 如果启用nSKG,加载相关数据
if self.use_nskg and self.nskg_path is not None:
self._load_nskg_data()

def _load_nskg_data(self):
"""加载nSKG数据"""
self.nskg_data = {}
if not os.path.exists(self.nskg_path):
print(f"警告: nSKG数据路径 {self.nskg_path} 不存在")
self.use_nskg = False
return

try:
if self.nskg_path.endswith('.ttl') and self.nskg_ontology_path is not None:
g = Graph()
try:
g.parse(self.nskg_path, format='turtle')
except Exception as e:
print(f"警告: TTL文件解析失败: {str(e)}")
self.use_nskg = False
return

# 加载本体文件
if os.path.exists(self.nskg_ontology_path):
for onto_file in os.listdir(self.nskg_ontology_path):
if onto_file.endswith('.ttl'):
onto_path = os.path.join(self.nskg_ontology_path, onto_file)
try:
g.parse(onto_path, format='turtle')
except Exception as e:
print(f"警告: 本体文件 {onto_file} 解析失败: {str(e)}")

print(f"成功加载nSKG数据,共 {len(g)} 个三元组")
self.nskg_data = self._convert_rdf_to_pyg(g)
else:
import pickle
with open(self.nskg_path, 'rb') as f:
self.nskg_data = pickle.load(f)
print(f"成功加载nSKG数据,共 {len(self.nskg_data)} 条记录")
except Exception as e:
print(f"加载nSKG数据失败: {str(e)}")
print("继续训练,但不使用nSKG数据")
self.use_nskg = False

def _convert_rdf_to_pyg(self, graph):
"""将RDF图转换为PyG格式

Args:
graph: RDF图对象

Returns:
转换后的数据字典,键为sample_token
"""
result = {}
try:
import torch_geometric as pyg

# 查询所有场景
scenes = {}
for s, p, o in graph.triples((None, None, None)):
# 假设每个场景都有一个token属性
if str(p).endswith('hasToken'):
scene_uri = str(s)
token = str(o)
scenes[scene_uri] = token

# 为每个场景构建图
for scene_uri, token in scenes.items():
# 收集节点
nodes = {}
node_types = {}
node_features = {}

# 收集边
edges = {}

# 查询与场景相关的所有三元组
for s, p, o in graph.triples((None, None, None)):
# 处理节点和边的逻辑...
pass

# 构建PyG数据对象
data = {
'x': node_features,
'edge_index': edges,
'node_type': node_types
}

result[token] = data

return result
except ImportError:
print("警告: 未安装PyTorch Geometric库,无法转换RDF数据为图格式")
return {}

def get_data_info(self, index):
"""获取数据信息,添加nSKG或nSTP数据"""
info = super().get_data_info(index)

# 获取当前样本的标识符
sample_token = info.get('sample_token', None)

# 如果启用nSKG,添加nSKG数据到info中
if self.use_nskg and hasattr(self, 'nskg_data') and self.nskg_data and sample_token in self.nskg_data:
info['nskg_graph'] = self.nskg_data[sample_token]

# 如果启用nSTP,添加nSTP数据到info中(优先使用nSTP)
if self.use_nstp and hasattr(self, 'nstp_data') and self.nstp_data and sample_token in self.nstp_data:
info['nstp_graph'] = self.nstp_data[sample_token]
# 如果同时存在nSKG和nSTP,使用nSTP替代nSKG
if 'nskg_graph' in info:
del info['nskg_graph']

return info

def _mask_points(self, pts_list):
assert self.ego_mask is not None
# remove points belonging to ego vehicle.
masked_pts_list = []
for pts in pts_list:
ego_mask = np.logical_and(
np.logical_and(self.ego_mask[0] <= pts[:, 0],
self.ego_mask[2] >= pts[:, 0]),
np.logical_and(self.ego_mask[1] <= pts[:, 1],
self.ego_mask[3] >= pts[:, 1]),
)
pts = pts[np.logical_not(ego_mask)]
masked_pts_list.append(pts)
pts_list = masked_pts_list
return pts_list

def union2one(self, previous_queue, future_queue):
# 1. get transformation from all frames to current (reference) frame
ref_meta = previous_queue[-1]['img_metas'].data
valid_scene_token = ref_meta['scene_token']
# compute reference e2g_transform and g2e_transform.
ref_e2g_translation = ref_meta['ego2global_translation']
ref_e2g_rotation = ref_meta['ego2global_rotation']
ref_e2g_transform = transform_matrix(
ref_e2g_translation, Quaternion(ref_e2g_rotation), inverse=False)
ref_g2e_transform = transform_matrix(
ref_e2g_translation, Quaternion(ref_e2g_rotation), inverse=True)
ref_l2e_translation = ref_meta['lidar2ego_translation']
ref_l2e_rotation = ref_meta['lidar2ego_rotation']
ref_l2e_transform = transform_matrix(
ref_l2e_translation, Quaternion(ref_l2e_rotation), inverse=False)
ref_e2l_transform = transform_matrix(
ref_l2e_translation, Quaternion(ref_l2e_rotation), inverse=True)

queue = previous_queue[:-1] + future_queue
pts_list = [each['points'].data for each in queue]
if self.ego_mask is not None:
pts_list = self._mask_points(pts_list)
total_cur2ref_lidar_transform = []
total_ref2cur_lidar_transform = []
total_pts_list = []
for i, each in enumerate(queue):
meta = each['img_metas'].data

# store points in the current frame.
cur_pts = pts_list[i].cpu().numpy().copy()
cur_pts[:, -1] = i
total_pts_list.append(cur_pts)

# store the transformation from current frame to reference frame.
curr_e2g_translation = meta['ego2global_translation']
curr_e2g_rotation = meta['ego2global_rotation']
curr_e2g_transform = transform_matrix(
curr_e2g_translation, Quaternion(curr_e2g_rotation), inverse=False)
curr_g2e_transform = transform_matrix(
curr_e2g_translation, Quaternion(curr_e2g_rotation), inverse=True)

curr_l2e_translation = meta['lidar2ego_translation']
curr_l2e_rotation = meta['lidar2ego_rotation']
curr_l2e_transform = transform_matrix(
curr_l2e_translation, Quaternion(curr_l2e_rotation), inverse=False)
curr_e2l_transform = transform_matrix(
curr_l2e_translation, Quaternion(curr_l2e_rotation), inverse=True)

# compute future to reference matrix.
cur_lidar_to_ref_lidar = (curr_l2e_transform.T @
curr_e2g_transform.T @
ref_g2e_transform.T @
ref_e2l_transform.T)
total_cur2ref_lidar_transform.append(cur_lidar_to_ref_lidar)

# compute reference to future matrix.
ref_lidar_to_cur_lidar = (ref_l2e_transform.T @
ref_e2g_transform.T @
curr_g2e_transform.T @
curr_e2l_transform.T)
total_ref2cur_lidar_transform.append(ref_lidar_to_cur_lidar)

# 2. Parse previous and future can_bus information.
imgs_list = [each['img'].data for each in previous_queue]
metas_map = {}
prev_scene_token = None
prev_pos = None
prev_angle = None
ref_meta = previous_queue[-1]['img_metas'].data

# 2.2. Previous
for i, each in enumerate(previous_queue):
metas_map[i] = each['img_metas'].data

if 'aug_param' in each:
metas_map[i]['aug_param'] = each['aug_param']

if metas_map[i]['scene_token'] != prev_scene_token:
metas_map[i]['prev_bev_exists'] = False
prev_scene_token = metas_map[i]['scene_token']
prev_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
prev_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
# Set the original point of this motion.
new_can_bus = copy.deepcopy(metas_map[i]['can_bus'])
new_can_bus[:3] = 0
new_can_bus[-1] = 0
metas_map[i]['can_bus'] = new_can_bus
else:
metas_map[i]['prev_bev_exists'] = True
tmp_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])
tmp_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])
# Compute the later waypoint.
# To align the shift and rotate difference due to the BEV.
new_can_bus = copy.deepcopy(metas_map[i]['can_bus'])
new_can_bus[:3] = tmp_pos - prev_pos
new_can_bus[-1] = tmp_angle - prev_angle
metas_map[i]['can_bus'] = new_can_bus
prev_pos = copy.deepcopy(tmp_pos)
prev_angle = copy.deepcopy(tmp_angle)

# compute cur_lidar_to_ref_lidar transformation matrix for quickly align generated
# bev features to the reference frame.
metas_map[i]['ref_lidar_to_cur_lidar'] = total_ref2cur_lidar_transform[i]

# 2.3. Future
current_scene_token = ref_meta['scene_token']
ref_can_bus = None
future_can_bus = []
future2ref_lidar_transform = []
ref2future_lidar_transform = []
for i, each in enumerate(future_queue):
future_meta = each['img_metas'].data
if future_meta['scene_token'] != current_scene_token:
break

# store the transformation:
future2ref_lidar_transform.append(
total_cur2ref_lidar_transform[i + len(previous_queue) - 1]
) # current -> reference.
ref2future_lidar_transform.append(
total_ref2cur_lidar_transform[i + len(previous_queue) - 1]
) # reference -> current.

# can_bus information.
if i == 0:
new_can_bus = copy.deepcopy(future_meta['can_bus'])
new_can_bus[:3] = 0
new_can_bus[-1] = 0
future_can_bus.append(new_can_bus)
ref_can_bus = copy.deepcopy(future_meta['can_bus'])
else:
new_can_bus = copy.deepcopy(future_meta['can_bus'])

new_can_bus_pos = np.array([0, 0, 0, 1]).reshape(1, 4)
ref2prev_lidar_transform = ref2future_lidar_transform[-2]
cur2ref_lidar_transform = future2ref_lidar_transform[-1]
new_can_bus_pos = new_can_bus_pos @ cur2ref_lidar_transform @ ref2prev_lidar_transform

new_can_bus_angle = new_can_bus[-1] - ref_can_bus[-1]
new_can_bus[:3] = new_can_bus_pos[:, :3]
new_can_bus[-1] = new_can_bus_angle
future_can_bus.append(new_can_bus)
ref_can_bus = copy.deepcopy(future_meta['can_bus'])

ret_queue = previous_queue[-1]
ret_queue['img'] = DC(torch.stack(imgs_list), cpu_only=False, stack=True)
ret_queue.pop('aug_param', None)

metas_map[len(previous_queue) - 1]['future_can_bus'] = np.array(future_can_bus)
metas_map[len(previous_queue) - 1]['future2ref_lidar_transform'] = (
np.array(future2ref_lidar_transform))
metas_map[len(previous_queue) - 1]['ref2future_lidar_transform'] = (
np.array(ref2future_lidar_transform))
metas_map[len(previous_queue) - 1]['total_cur2ref_lidar_transform'] = (
np.array(total_cur2ref_lidar_transform))
metas_map[len(previous_queue) - 1]['total_ref2cur_lidar_transform'] = (
np.array(total_ref2cur_lidar_transform))

ret_queue['img_metas'] = DC(metas_map, cpu_only=True)
ret_queue.pop('points')
ret_queue['gt_points'] = DC(
torch.from_numpy(np.concatenate(total_pts_list, 0)), cpu_only=False)
if len(future_can_bus) < 1 + self.future_length:
return None
return ret_queue

def _load_nstp_data(self):
"""加载nSTP数据"""
self.nstp_data = {}
if not os.path.exists(self.nstp_path):
print(f"警告: nSTP数据路径 {self.nstp_path} 不存在")
self.use_nstp = False
return

try:
import torch
import glob
import os.path as osp

# 获取目录中所有的.pt文件
pt_files = glob.glob(osp.join(self.nstp_path, "*.pt"))
if not pt_files:
print(f"警告: 在 {self.nstp_path} 中未找到.pt文件")
self.use_nstp = False
return

print(f"找到 {len(pt_files)} 个nSTP数据文件")

# 加载每个.pt文件
for pt_file in pt_files:
try:
# 从文件名获取样本ID
sample_id = osp.splitext(osp.basename(pt_file))[0]

# 加载PyTorch张量
graph_data = torch.load(pt_file)

# 将数据添加到字典中
self.nstp_data[sample_id] = graph_data

except Exception as e:
print(f"加载文件 {pt_file} 失败: {str(e)}")

print(f"成功加载 {len(self.nstp_data)} 个nSTP样本")

except Exception as e:
print(f"加载nSTP数据失败: {str(e)}")
print("继续训练,但不使用nSTP数据")
self.use_nstp = False

2.1.2 NuPlan数据集

文件路径: d:\git_clone\ViDAR\projects\mmdet3d_plugin\datasets\nuplan_vidar_dataset_v1.py

主要修改:

  • 与NuScenes数据集类似,添加了nSTP支持
  • 实现了特定于NuPlan数据集的nSTP数据加载和处理逻辑

2.2 模型头部修改

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\dense_heads\vidar_head_v1.py

主要修改:

  • 添加了nSTP相关参数:use_nstp, nstp_encoder_cfg, nstp_enhancer_cfg
  • 集成了nSTP编码器和增强器到模型头部
  • 修改了前向传播逻辑,处理nSTP特征
#---------------------------------------------------------------------------------#
# Visual Point Cloud Forecasting enables Scalable Autonomous Driving #
# Copyright (c) OpenDriveLab. All rights reserved. #
#---------------------------------------------------------------------------------#

"""
<V1.multiframe> of ViDAR future prediction head:
* Predict future & history frames simultaneously.
"""

import copy
import torch
import torch.nn as nn
import numpy as np

from mmdet.models import HEADS, build_loss

from mmcv.runner import force_fp32, auto_fp16
from .vidar_head_base import ViDARHeadBase


@HEADS.register_module()
class ViDARHeadV1(ViDARHeadBase):
def __init__(self,
history_queue_length,
pred_history_frame_num=0,
pred_future_frame_num=0,
per_frame_loss_weight=(1.0,),
use_nskg=False,
nskg_encoder_cfg=None,
nskg_enhancer_cfg=None,
use_nstp=False, # 添加nSTP支持参数
nstp_encoder_cfg=None, # 添加nSTP编码器配置
nstp_enhancer_cfg=None, # 添加nSTP增强器配置
*args,
**kwargs):
super().__init__(*args, **kwargs)

self.history_queue_length = history_queue_length
self.pred_history_frame_num = pred_history_frame_num
self.pred_future_frame_num = pred_future_frame_num

self.pred_frame_num = 1 + self.pred_history_frame_num + self.pred_future_frame_num
self.per_frame_loss_weight = per_frame_loss_weight
assert len(self.per_frame_loss_weight) == self.pred_frame_num

self._init_bev_pred_layers()

# nSKG支持
self.use_nskg = use_nskg
# nSTP支持
self.use_nstp = use_nstp

if self.use_nskg:
from ..modules.nskg_gnn import NSKGEncoder
from ..modules.nskg_bev_enhancer import NSKGBEVEnhancer

# 创建nSKG编码器
if nskg_encoder_cfg is not None:
self.nskg_encoder = NSKGEncoder(**nskg_encoder_cfg)
else:
self.nskg_encoder = NSKGEncoder(
in_channels=8,
hidden_channels=64,
out_channels=256,
num_layers=2,
gnn_type='gat',
use_hetero=True
)

# 创建BEV特征增强器
if nskg_enhancer_cfg is not None:
self.nskg_enhancer = NSKGBEVEnhancer(**nskg_enhancer_cfg)
else:
self.nskg_enhancer = NSKGBEVEnhancer(
bev_channels=self.embed_dims,
nskg_channels=256,
hidden_channels=128,
bev_h=self.bev_h,
bev_w=self.bev_w,
use_attention=True
)
else:
self.nskg_encoder = None
self.nskg_enhancer = None

# 添加nSTP支持
if self.use_nstp:
from ..modules.nstp_encoder import NSTPEncoder, NSTPEnhancer

# 创建nSTP编码器
if nstp_encoder_cfg is not None:
self.nstp_encoder = NSTPEncoder(**nstp_encoder_cfg)
else:
self.nstp_encoder = NSTPEncoder(
in_channels=64,
hidden_channels=128,
out_channels=256,
num_layers=3,
gnn_type='graphsage',
dropout=0.1,
aggr='mean'
)

# 创建nSTP增强器
if nstp_enhancer_cfg is not None:
self.nstp_enhancer = NSTPEnhancer(**nstp_enhancer_cfg)
else:
self.nstp_enhancer = NSTPEnhancer(
bev_channels=self.embed_dims,
nstp_channels=256,
hidden_channels=128,
bev_h=self.bev_h,
bev_w=self.bev_w,
use_attention=True
)
else:
self.nstp_encoder = None
self.nstp_enhancer = None

def forward(self, mlvl_feats, img_metas, prev_bev=None, **kwargs):
"""Forward function.
Args:
mlvl_feats (list(Tensor)): 多尺度特征,每个元素形状为 [B, num_cam, C, H, W]
img_metas (list(dict)): 图像元信息
prev_bev: 历史BEV特征
Returns:
tuple: bev_embed, history_states, future_states
"""
# 调用父类的forward方法获取原始结果
bev_embed, history_states, future_states = super().forward(
mlvl_feats, img_metas, prev_bev, **kwargs)

# 如果启用nSKG,处理图数据增强BEV特征
if self.use_nskg and self.nskg_encoder is not None and self.nskg_enhancer is not None:
bs = bev_embed.shape[0]
bev_h, bev_w = self.bev_h, self.bev_w

nskg_graphs = []
for img_meta in img_metas:
nskg_graph = img_meta.get('nskg_graph', None)
nskg_graphs.append(nskg_graph)

# 处理每个样本的nSKG数据
enhanced_bevs = []
for i in range(bs):
# 获取当前样本的BEV特征
curr_bev = bev_embed[i:i+1].view(1, bev_h, bev_w, -1).permute(0, 3, 1, 2)

# 获取当前样本的nSKG图
curr_graph = nskg_graphs[i] if i < len(nskg_graphs) and nskg_graphs[i] is not None else None

if curr_graph is not None:
# 使用GNN编码器处理图数据
node_features, global_features = self.nskg_encoder(curr_graph)

# 获取节点位置信息
if hasattr(curr_graph, 'pos'):
node_pos = curr_graph.pos
elif isinstance(curr_graph, dict) and 'pos' in curr_graph:
node_pos = curr_graph['pos']
else:
node_pos = None

# 增强BEV特征
enhanced_bev = self.nskg_enhancer(
curr_bev, node_features, global_features, node_pos)

enhanced_bevs.append(enhanced_bev)
else:
# 如果没有nSKG数据,保持原始BEV特征不变
enhanced_bevs.append(curr_bev)

# 合并增强后的BEV特征
if enhanced_bevs:
enhanced_bev = torch.cat(enhanced_bevs, dim=0)
# 转回原始格式
bev_embed = enhanced_bev.permute(0, 2, 3, 1).reshape(bs, bev_h * bev_w, -1)

# 添加nSTP支持
if self.use_nstp and self.nstp_encoder is not None and self.nstp_enhancer is not None:
bs = bev_embed.shape[0]
bev_h, bev_w = self.bev_h, self.bev_w

nstp_graphs = []
for img_meta in img_metas:
nstp_graph = img_meta.get('nstp_graph', None)
nstp_graphs.append(nstp_graph)

# 处理每个样本的nSTP数据
enhanced_bevs = []
for i in range(bs):
# 获取当前样本的BEV特征
curr_bev = bev_embed[i:i+1].view(1, bev_h, bev_w, -1).permute(0, 3, 1, 2)

# 获取当前样本的nSTP图
curr_graph = nstp_graphs[i] if i < len(nstp_graphs) and nstp_graphs[i] is not None else None

if curr_graph is not None:
# 使用GNN编码器处理图数据
node_features = self.nstp_encoder(curr_graph)

# 获取节点位置信息(如果有)
node_pos = None
if hasattr(curr_graph, 'pos'):
node_pos = curr_graph.pos
elif isinstance(curr_graph, dict) and 'pos' in curr_graph:
node_pos = curr_graph['pos']

# 增强BEV特征
enhanced_bev = self.nstp_enhancer(curr_bev, node_features, node_pos)
enhanced_bevs.append(enhanced_bev)
else:
# 如果没有nSTP数据,保持原始BEV特征不变
enhanced_bevs.append(curr_bev)

# 合并增强后的BEV特征
if enhanced_bevs:
enhanced_bev = torch.cat(enhanced_bevs, dim=0)
# 转回原始格式
bev_embed = enhanced_bev.permute(0, 2, 3, 1).reshape(bs, bev_h * bev_w, -1)

return bev_embed, history_states, future_states

def _init_bev_pred_layers(self):
"""Overwrite the {self.bev_pred_head} of super()._init_layers()
"""
bev_pred_branch = []
for _ in range(self.num_pred_fcs):
bev_pred_branch.append(nn.Linear(self.embed_dims, self.embed_dims))
bev_pred_branch.append(nn.LayerNorm(self.embed_dims))
bev_pred_branch.append(nn.ReLU(inplace=True))
bev_pred_branch.append(nn.Linear(
self.embed_dims, self.pred_frame_num * self.num_pred_height))
bev_pred_head = nn.Sequential(*bev_pred_branch)

def _get_clones(module, N):
return nn.ModuleList([copy.deepcopy(module) for i in range(N)])

# Auxiliary supervision for all intermediate results.
num_pred = self.transformer.decoder.num_layers
self.bev_pred_head = _get_clones(bev_pred_head, num_pred)

def forward_head(self, next_bev_feats):
"""Get freespace estimation from multi-frame BEV feature maps.

Args:
next_bev_feats (torch.Tensor): with shape as
[pred_frame_num, inter_num, bs, bev_h * bev_w, dims]
pred_frame_num: history frames + current frame + future frames.
"""
next_bev_preds = []
for lvl in range(next_bev_feats.shape[1]):
# pred_frame_num, bs, bev_h * bev_w, num_height_pred * num_frame
# ===> pred_frame_num, bs, bev_h * bev_w, num_height_pred, num_frame
# ===> pred_frame_num, num_frame, bs, bev_h * bev_w, num_height_pred.
next_bev_pred = self.bev_pred_head[lvl](next_bev_feats[:, lvl])
next_bev_pred = next_bev_pred.view(
*next_bev_pred.shape[:-1], self.num_pred_height, self.pred_frame_num)

base_bev_pred = next_bev_pred[..., self.pred_history_frame_num][..., None]
next_bev_pred = torch.cat([
next_bev_pred[..., :self.pred_history_frame_num] + base_bev_pred,
base_bev_pred,
next_bev_pred[..., self.pred_history_frame_num + 1:] + base_bev_pred
], -1)

next_bev_pred = next_bev_pred.permute(0, 4, 1, 2, 3).contiguous()
next_bev_preds.append(next_bev_pred)
# pred_frame_num, inter_num, num_frame, bs, bev_h*bev_w, num_height_pred
next_bev_preds = torch.stack(next_bev_preds, 1)
return next_bev_preds

def _get_reference_gt_points(self,
gt_points,
src_frame_idx_list,
tgt_frame_idx_list,
img_metas):
"""Transform gt_points at src_frame_idx in {src_frame_idx_list} to the coordinate space
of each tgt_frame_idx in {tgt_frame_idx_list}.
"""
bs = len(gt_points)
aligned_gt_points = []
batched_origin_points = []
for frame_idx, src_frame_idx, tgt_frame_idx in zip(
range(len(src_frame_idx_list)), src_frame_idx_list, tgt_frame_idx_list):
# 1. get gt_points belongs to src_frame_idx.
src_frame_gt_points = [p[p[:, -1] == src_frame_idx] for p in gt_points]

# 2. get transformation matrix..
src_to_ref = [img_meta['total_cur2ref_lidar_transform'][src_frame_idx] for img_meta in img_metas]
src_to_ref = gt_points[0].new_tensor(np.array(src_to_ref)) # bs, 4, 4
ref_to_tgt = [img_meta['total_ref2cur_lidar_transform'][tgt_frame_idx] for img_meta in img_metas]
ref_to_tgt = gt_points[0].new_tensor(np.array(ref_to_tgt)) # bs, 4, 4
src_to_tgt = torch.matmul(src_to_ref, ref_to_tgt)

# 3. transfer src_frame_gt_points to src_to_tgt.
aligned_gt_points_per_frame = []
for batch_idx, points in enumerate(src_frame_gt_points):
new_points = points.clone() # -1, 4
new_points = torch.cat([
new_points[:, :3], new_points.new_ones(new_points.shape[0], 1)
], 1)
new_points = torch.matmul(new_points, src_to_tgt[batch_idx])
new_points[..., -1] = frame_idx
aligned_gt_points_per_frame.append(new_points)
aligned_gt_points.append(aligned_gt_points_per_frame)

# 4. obtain the aligned origin points.
aligned_origin_points = torch.from_numpy(
np.zeros((bs, 1, 3))).to(src_to_tgt.dtype).to(src_to_tgt.device)
aligned_origin_points = torch.cat([
aligned_origin_points[..., :3], torch.ones_like(aligned_origin_points)[..., 0:1]
], -1)
aligned_origin_points = torch.matmul(aligned_origin_points, src_to_tgt)
batched_origin_points.append(aligned_origin_points[..., :3].contiguous())

# stack points from different timestamps, and transfer to occupancy representation.
batched_gt_points = []
for b in range(bs):
cur_gt_points = [
aligned_gt_points[frame_idx][b]
for frame_idx in range(len(src_frame_idx_list))]
cur_gt_points = torch.cat(cur_gt_points, 0)
batched_gt_points.append(cur_gt_points)

batched_origin_points = torch.cat(batched_origin_points, 1)
return batched_gt_points, batched_origin_points

@force_fp32(apply_to=('pred_dict'))
def loss(self,
pred_dict,
gt_points,
start_idx,
tgt_bev_h,
tgt_bev_w,
tgt_pc_range,
pred_frame_num,
img_metas=None,
batched_origin_points=None):
""""Compute loss for all history according to gt_points.

gt_points: ground-truth point cloud in each frame.
list of tensor with shape [-1, 5], indicating ground-truth point cloud in
each frame.
"""
bev_preds = pred_dict['next_bev_preds']
valid_frames = np.array(pred_dict['valid_frames'])
start_frames = (valid_frames + self.history_queue_length - self.pred_history_frame_num)
tgt_frames = valid_frames + self.history_queue_length

full_prev_bev_exists = pred_dict.get('full_prev_bev_exists', True)
if not full_prev_bev_exists:
frame_idx_for_loss = [self.pred_history_frame_num] * self.pred_frame_num
else:
frame_idx_for_loss = np.arange(0, self.pred_frame_num)

loss_dict = dict()
for idx, i in enumerate(frame_idx_for_loss):
# 1. get the predicted occupancy of frame-i.
cur_bev_preds = bev_preds[:, :, i, ...].contiguous()

# 2. get the frame index of current frame.
src_frames = start_frames + i

# 3. get gt_points belonging to cur_valid_frames.
cur_gt_points, cur_origin_points = self._get_reference_gt_points(
gt_points,
src_frame_idx_list=src_frames,
tgt_frame_idx_list=tgt_frames,
img_metas=img_metas)

# 4. compute loss.
if i != self.pred_history_frame_num:
# For aux history-future supervision:
# only compute loss for cur_frame prediction.
loss_weight = np.array([[1]] + [[0]] * (len(self.loss_weight) - 1))
else:
loss_weight = self.loss_weight

cur_loss_dict = super().loss(
dict(next_bev_preds=cur_bev_preds,
valid_frames=np.arange(0, len(src_frames))),
cur_gt_points,
start_idx=start_idx,
tgt_bev_h=tgt_bev_h,
tgt_bev_w=tgt_bev_w,
tgt_pc_range=tgt_pc_range,
pred_frame_num=len(self.loss_weight)-1,
img_metas=img_metas,
batched_origin_points=cur_origin_points,
loss_weight=loss_weight)

# 5. merge dict.
cur_frame_loss_weight = self.per_frame_loss_weight[i]
cur_frame_loss_weight = cur_frame_loss_weight * (idx == i)
for k, v in cur_loss_dict.items():
loss_dict.update({f'frame.{idx}.{k}.loss': v * cur_frame_loss_weight})
return loss_dict

@force_fp32(apply_to=('pred_dict'))
def get_point_cloud_prediction(self,
pred_dict,
gt_points,
start_idx,
tgt_bev_h,
tgt_bev_w,
tgt_pc_range,
img_metas=None,
batched_origin_points=None):
""""Generate point cloud prediction.
"""
# pred_frame_num, inter_num, num_frame, bs, bev_h * bev_w, num_height_pred
pred_dict['next_bev_preds'] = pred_dict['next_bev_preds'][:, :, self.pred_history_frame_num, ...].contiguous()

valid_frames = np.array(pred_dict['valid_frames'])
valid_gt_points, cur_origin_points = self._get_reference_gt_points(
gt_points,
src_frame_idx_list=valid_frames + self.history_queue_length,
tgt_frame_idx_list=valid_frames + self.history_queue_length,
img_metas=img_metas)
return super().get_point_cloud_prediction(
pred_dict=pred_dict,
gt_points=valid_gt_points,
start_idx=start_idx,
tgt_bev_h=tgt_bev_h,
tgt_bev_w=tgt_bev_w,
tgt_pc_range=tgt_pc_range,
img_metas=img_metas,
batched_origin_points=cur_origin_points)

2.3 ViDAR检测器修改

文件路径: ViDAR\projects\mmdet3d_plugin\bevformer\detectors\vidar.py

主要修改:

  • 修改了forward_train方法:处理nSTP特征,并解决了元组类型问题
  • 修改了forward_test方法:支持测试时使用nSTP特征

关键修改部分:

def forward_train(self, **kwargs):
# ...现有代码...

# 修改部分:处理next_bev_feats中可能的元组类型
processed_next_bev_feats = []
for feat in next_bev_feats:
if isinstance(feat, tuple):
# 如果是元组,取第一个元素(主要特征)
processed_next_bev_feats.append(feat[0])
else:
processed_next_bev_feats.append(feat)

next_bev_feats = torch.stack(processed_next_bev_feats, 0)

# ...继续现有代码...

3. 配置文件修改

3.1 OpenScene配置

文件路径: ViDAR\projects\configs\vidar_pretrain\OpenScene\vidar_OpenScene_mini_1_8_3future_nstp.py

主要修改:

  • 添加了nSTP相关配置:启用nSTP,设置数据路径
  • 修改了数据处理流程,添加了nSTP数据处理组件
  • 配置了nSTP编码器和增强器参数
# nSTP配置
use_nskg = False # 禁用nSKG
use_nstp = True # 启用nSTP
nstp_path = 'data/nuscenes/nstp/train/raw' # nSTP数据目录路径

# 添加nSTP数据处理组件
train_pipeline.insert(-2, dict(type='ProcessNSTPGraph'))

# 修改数据集配置
data = dict(
# ...
train=dict(
# ...
use_nstp=use_nstp,
nstp_path=nstp_path,
# ...
),
# ...
)

3.2 NuScenes全集配置

文件路径: ViDAR\projects\configs\vidar_pretrain\nusc_fullset\vidar_nstp_nusc.py

主要修改:

  • 基于基础配置,添加了nSTP支持
  • 配置了nSTP数据路径和处理逻辑
_base_ = ['./vidar_full_nusc_1future.py']

# nSTP配置
use_nskg = False
use_nstp = True
nstp_path = 'data/nuscenes/nstp/nstp.pkl' # nSTP数据路径

# 修改数据集配置
data = dict(
# ...
train=dict(
type='NuScenesViDARDatasetV1',
use_nskg=use_nskg,
use_nstp=use_nstp,
nstp_path=nstp_path,
# ...
),
# ...
)

4. 其他辅助修改

4.1 数据集注册

文件路径: d:\git_clone\ViDAR\projects\mmdet3d_plugin\datasets\__init__.py

主要修改:

  • 导入并注册了nSTP相关模块:NSTPEncoder, NSTPEnhancer
from .nstp_encoder import NSTPEncoder, NSTPEnhancer

__all__ = [
# ...
'NSTPEncoder',
'NSTPEnhancer',
# ...
]

5. 过程中的好多错误

bb0c0e4dfe0861b863c951950326904

首先是数据集nSKG不能用,用了会出现这个问题:

c5f86c19470657dbb32886c2a1ecc09

592bd326d2dfd419178041902a18105

然后进而导致:

baa821cdf2653096e4ff3d91e0adb78

46aa267f5409cd333b793036e91f475

然后对他做细致处理的话,其实也可以,但是我写的代码处理不了:

919ca81a4064679c687412f83fcacbf

所以最后选择使用nSTP,因为在nuScenes Knowledge Graph发现了nSTP是对nSKG的拓展,而且可以直接拿来训练,因此修改代码适配:

image-20250428205817288

94c305fa70e234b826ba190de87e104

最后结果,可以正常读取nSTP数据文件:

f4d05a563eb5ef922c95d96436bf1d2

但是……爆内存了:

64c1d3c75e8a169d3bffaf3d1b7f692

写到这里的时候刚修复了一小点bug,目前仍然在服务器上跑着……

然后最后贴一张饱受折损的服务器合照(感谢罗勇老师)

926d4f6ee558907674f7927728597bc

b83a187f7c801ef796584bdd84e832b

6. 最后总结

nSTP集成工作主要包括以下几个方面:

  1. 数据处理:创建了nSTP图数据的加载和处理逻辑,支持从.pt文件中读取图结构数据
  2. 特征提取:实现了基于图神经网络的nSTP编码器,提取图结构中的时空特征
  3. 特征融合:实现了nSTP特征与BEV特征的融合机制,通过注意力机制增强BEV特征
  4. 模型集成:将nSTP模块集成到ViDAR模型中,修改了前向传播逻辑
  5. 配置支持:添加了nSTP相关配置,支持灵活开启/关闭nSTP功能

这些修改使ViDAR模型能够利用nSTP提供的场景结构和时间演化信息,增强了模型对动态场景的理解能力,特别是在预测未来帧方面。

然后我们完成的工作:

  1. 完成环境配置,解决冲突依赖问题
  2. 完成数据集的读取与训练问题
  3. 完成对vidar的修改以加入nSTP数据集来调优
  4. 修复原本的pytouch问题
  5. 目前仍然在服务器上跑着,估计还有不少后续的训练问题需要修改……但是没时间了