Shanghai-based artificial intelligence startup StepFun and Chinese carmaker Geely Auto Group jointly announced on Tuesday the open-sourcing of two multimodal large models to global developers.
The initiative comes amid the recent array of open-source models led by DeepSeek-R1 developed by Chinese startup DeepSeek. The action hopes to contribute to Chinese companies' efforts to offer the world the best open-sourcing large models.
The two large models are the results of collaboration between StepFun and Geely. Of the two models, Step-Video-T2V is the world's largest and highest-performing open-source video generation model, and the other, Step-Audio, is the industry's first product-grade open-source voice interaction model. Both are now available on the Yuewen app from Tuesday, according to the company.
StepFun is a strategic partner of Geely's technology ecosystem. The two parties' in-depth cooperation has complemented each other's strengths in computing algorithms and scenario training, which has significantly enhanced the performance of large models.
According to StepFun, the open-source initiative aims to foster technological sharing and innovation in AI while advancing inclusive AI development. It also injects cutting-edge multimodal capabilities into the global open-source community, strengthening China's presence in the open-source large model landscape.