CLASSICAL ENGINEERING VS. DEEP SPATIO-TEMPORAL PARADIGMS IN VIDEO POLYP SEGMENTATION: A SYSTEMATIC COMPARATIVE REVIEW
Keywords:
Video Polyp Segmentation; Spatio-Temporal Modeling; ConvLSTMAbstract
This systematic review compares three foundational paradigms in
Video Polyp Segmentation (VPS) for automated colonoscopy: classical engineering,
static 2D deep learning, and recurrent spatio-temporal architectures. While the field
has shifted from geometric rules to data-driven networks, balancing frame-level
accuracy with temporal consistency remains a critical engineering challenge. Using
the multi-center SUN-SEG database, we establish a structural taxonomy by
evaluating each paradigm's mathematical formulation, failure modes, and throughput
under clinical artifacts like specular reflections, motion blur, and out-of-view events.
Our synthesis reveals that classical hand-crafted methods offer deterministic
explainability but fail under imaging noise. Memory-less 2D deep networks achieve
high spatial accuracy but suffer from boundary flickering and tracking dropouts due
to an inter-frame blind spot. Conversely, recurrent spatio-temporal hybrids exhibit
superior resilience; by integrating bottleneck gating mechanics, they leverage
historical hidden states to stabilize boundaries and project polyp shapes through
intense noise without sacrificing real-time throughput. This mapping outlines key
architectural trade-offs, serving as a deployment reference for future video-stream
intelligence frameworks.

