Speech-to-Speech Model Comparison

👏 Welcome to the Speech-to-Speech (S2S) Model Evaluation!

In this evaluation, you will assess the performance of different S2S models, such as ChatGPT-4o, FunAudioLLM, SpeechGPT, and Mini-Omni.

🎯 Goal: Test how well these models handle speech tasks across different domains.

🌰 Example:

🎵 Audio Sample:

"Say the following sentence at my speed first, then say it again very slowly: 'Artificial intelligence is changing the world in many ways.'" 🧠 (Note: The audio plays at 1.5x the normal speed.)

📊 Model Performance:

ChatGPT-4o:

🎙️ Speech: Partially followed the instruction on speed.

🧾 Semantics: Accurately followed the instruction, with no semantic deviation or missing information.

FunAudioLLM:

🎙️ Speech: Partially followed the instruction on speed.

🧾 Semantics: Accurately followed the instruction, with no semantic deviation or missing information.

SpeechGPT:

🎙️ Speech: Did not follow the instruction on speed.

🧾 Semantics: Partially followed the instruction, with minor semantic deviation and missing information.

Mini-Omni:

🎙️ Speech: Did not follow the instruction on speed.

🧾 Semantics: Did not follow the instruction, with significant semantic deviation and missing information.

After making your choice, you'll proceed to the next round. 🔄

💡 Please enter your username to start!

⚖️ Speech-to-Speech Model Comparison

📂 Select Test Category:

📝 Select Specific Task:

🎉 Thank you for completing the test!