⚖️ Speech-to-Speech Model Comparison

👏 Welcome to the Speech-to-Speech (S2S) Model Evaluation!

In this evaluation, you will assess the performance of different S2S models, such as ChatGPT-4o, FunAudioLLM, SpeechGPT, and Mini-Omni.

🎯 Goal: Test how well these models handle speech tasks across different domains.

🌰 Example:

🎵 Audio Sample:
"Say the following sentence at my speed first, then say it again very slowly: 'Artificial intelligence is changing the world in many ways.'" 🧠 (Note: The audio plays at 1.5x the normal speed.)

📊 Model Performance:
ChatGPT-4o:

🎙️ Speech: Partially followed the instruction on speed.

🧾 Semantics: Accurately followed the instruction, with no semantic deviation or missing information.


FunAudioLLM:

🎙️ Speech: Partially followed the instruction on speed.

🧾 Semantics: Accurately followed the instruction, with no semantic deviation or missing information.


SpeechGPT:

🎙️ Speech: Did not follow the instruction on speed.

🧾 Semantics: Partially followed the instruction, with minor semantic deviation and missing information.


Mini-Omni:

🎙️ Speech: Did not follow the instruction on speed.

🧾 Semantics: Did not follow the instruction, with significant semantic deviation and missing information.

After making your choice, you'll proceed to the next round. 🔄

💡 Please enter your username to start!

📋 Task description:
🎵 Audio:
📜 Audio text:

🤔 Question: Which model performs better?

🤖 Model A:
🤖 Model B:

🤖 Model A:

🤖 Model B:

✅ Your Choice: 😃