WhisperX - Fast Automatic Speech Recognition

Whisper is an ASR model developed by OpenAI, trained on a large dataset of diverse audio. Whilst it does produce highly accurate transcriptions, the corresponding timestamps are at the utterance level, not per word, and can be inaccurate by several seconds. OpenAI's whisper does not natively support batching.