The datasets consist of multi-object scenes. Each image or video is accompanied by ground-truth segmentation masks for all objects in the scene. For some datasets (excluding Objects Room and CATER), ...
Some results have been hidden because they may be inaccessible to you