Hello, I am Ethan

I-Hsiang Chiu

Major in Electrical Engineering at the National Taiwan University.

Proficient in deep learning, with a focus on multimodal analysis and trustworthy AI. I was an undergraduate researcher at Prof. Hung-Yi Lee’s Speech Processing and Machine Learning Lab.

Enjoy Game Dev, Computer Graphics, Computer Security, and Web Dev.

Driven by Curiosity Driven by Curiosity Driven by Curiosity Driven by Curiosity Driven by Curiosity Driven by Curiosity Driven by Curiosity Driven by Curiosity

Defined by Excellence Defined by Excellence Defined by Excellence Defined by Excellence Defined by Excellence Defined by Excellence Defined by Excellence Defined by Excellence

RESEARCH

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

This benchmark enables general-purpose evaluation of audio/visual and bimodal fusion representations on 7 datasets covering 5 audio-visual tasks in speech and audio processing.

Published in ICASSP 2024, Co-second Author

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

In this paper, we propose the Diffusion and Flow-matching based Audio Deepfake (DFADD) dataset. The DFADD dataset collects deepfake audio based on advanced diffusion and flow-matching TTS models. Additionally, we reveal that current anti-spoofing models lack sufficient robustness against highly human-like audio generated by diffusion and flow-matching TTS systems. The proposed DFADD dataset addresses this gap and provides a valuable resource for developing more resilient anti-spoofing models.

Published in IEEE SLT 2024, Co-first Author

Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening.

Preprint, Technical Report

Dynamic-SUPERB

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

ICLR 2025 (In Review, All reviewers have accepted)

PROJECTS

Computer Graphics: Raytracing with Image Morphing
Morphing is an image processing technique that generates a gradual transition between two images. This project explores the application of morphing in ray tracing to increase frame rate and optimizes results across several dimensions, including performance, display control, and speed.
NVIDIA-NTU Artificial Intelligence Joint Research Center
I developed the website for the NVIDIA-NTU Artificial Intelligence Joint Research Center, which was established in 2023 as an information platform for the collaboration between NVIDIA and National Taiwan University.
I Think Therefore It Is: Selective Object Display Based on Intention Prediction within Virtual Environment
This work introduces a system that predicts users’ needs to interact with the physical world while they are immersed in VR and automatically displays the relevant objects within the VR environment.
Online LLM Agents Debate Platform for Generative AI Workshops
Built an online platform for LLM agents to debate. These agents can use RAG to incorporate external knowledge related to the topic of the debate.