Abstract: This is an official pytorch implementation of Learning Structure-Supporting Dependencies via Keypoint Interactive Transformer for General Mammal Pose Estimation. In this work, to achieve ...
All models are trained on ImageNet with an input shape of 256x256. All models downsample the images to a spatial size of 16x16, leading to a latent representation of 16x16xK bits per image.