Static Target
        
RTP: Real-time Traffic Perception          HD: Hallucination Detection          KIE: Key Information Extraction
TCD: Traffic Change Detection          DDM: Driving Decision-Making          PTM: Past Traffic Memory
Dynamic Target
        
AP: Action Prediction          LP: Location Prediction          DP: Distance Prediction
Event Oriented
        
RP: Risk Prediction          RA: Risk Analysis          ARA: Accident Reason Answering
By default, this leaderboard is sorted by overall Accuracy scores. To view other sorted results, please click on the corresponding cell.
| # | Task Name Subset Name |
Size | #Frames | Static Target | Dynamic Target | Event Oriented | Overall | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RTP | HD | KIE | TCD | DDM | PTM | Avg. | AP | LP | DP | Avg. | RP | RA | ARA | Avg. | |||||
| ⭐StreamForest (FT-drive)
Ours |
7B | 1fps | 70.1 | 17.1 | 100.0 | 60.0 | 32.7 | 83.6 | 64.6 | 64.0 | 96.6 | 59.6 | 70.7 | 71.8 | 93.4 | 58.3 | 78.5 | 71.2 | |
| ⭐StreamForest
Ours |
7B | 1fps | 51.4 | 15.5 | 54.7 | 56.4 | 38.6 | 65.3 | 51.5 | 72.6 | 83.2 | 46.0 | 62.3 | 60.2 | 73.3 | 47.4 | 63.8 | 59.9 | |
| Qwen2.5-VL
Alibaba |
7B | 1fps | 51.8 | 8.1 | 79.3 | 49.1 | 36.0 | 57.3 | 48.3 | 50.4 | 82.6 | 46.9 | 57.5 | 47.6 | 78.6 | 52.6 | 59.4 | 55.6 | |
| ⭐VideoChat-Online
NJU |
4B | 1fps | 36.9 | 0.8 | 62.3 | 49.1 | 21.5 | 47.0 | 36.1 | 70.2 | 86.7 | 46.4 | 62.9 | 51.2 | 69.4 | 45.5 | 57.4 | 54.5 | |
| VideoChat-Flash
Shanghai AI Lab |
7B | 256 | 29.6 | 15.5 | 45.3 | 76.4 | 26.1 | 36.1 | 32.2 | 73.5 | 75.3 | 47.2 | 61.0 | 67.1 | 64.8 | 46.2 | 64.3 | 54.4 | |
| InternVL2.5
Shanghai AI Lab |
8B | 32 | 40.1 | 16.3 | 37.7 | 52.7 | 30.4 | 40.9 | 37.2 | 64.1 | 84.6 | 49.5 | 62.5 | 54.0 | 60.6 | 50.6 | 56.1 | 54.2 | |
|
LLaVA-OneVision
Bytedance |
7B | 64 | 36.0 | 4.9 | 22.6 | 60.0 | 31.4 | 39.0 | 34.2 | 53.6 | 70.3 | 47.4 | 55.1 | 57.9 | 72.2 | 47.4 | 62.2 | 51.6 | |
| MiniCPM-V 2.6
OpenBMB |
7B | 64 | 20.0 | 87.8 | 15.1 | 49.1 | 26.4 | 20.6 | 27.3 | 71.2 | 73.4 | 47.2 | 60.0 | 73.4 | 33.3 | 16.7 | 53.6 | 49.8 | |
| LongVA
LMMs-Lab |
7B | 64 | 29.9 | 7.3 | 37.7 | 47.3 | 38.0 | 33.6 | 31.8 | 66.6 | 58.6 | 50.9 | 56.6 | 57.5 | 58.1 | 46.2 | 56.7 | 50.2 | |
| ⭐ Dispider
CUHK |
7B | 1fps | 31.1 | 7.3 | 34.0 | 63.6 | 34.0 | 35.4 | 32.5 | 43.2 | 73.1 | 45.8 | 52.7 | 38.2 | 55.4 | 36.5 | 44.3 | 45.2 | |
| ⭐ Flash-Vstream
THU |
7B | 1fps | 25.4 | 1.6 | 11.3 | 50.9 | 36.0 | 22.1 | 24.8 | 25.5 | 39.8 | 47.2 | 40.2 | 32.4 | 48.6 | 30.1 | 38.1 | 35.7 | |
⭐: indicates the input is streaming video