Called CSM-1B, the model generates “RVQ audio codes” from text and audio inputs, according to Sesame’s description on the AI dev platform Hugging Face. RVQ refers to “residual vector ...