Liu Song’s Projects


~/Projects/WhisperSpeech

git clone https://code.lsong.org/WhisperSpeech

History

ref
main
Hash Date Commit message Author
d02564f5 2024-02-18 00:15:27 Update gui_file_to_text_to_audio_playback.py BBC-Esq
044947d9 2024-02-17 23:02:01 gui/user text/file/tts BBC-Esq
c1b64916 2024-02-16 22:17:03 requirements: added torch >=2, torchaudio, and soundfile (thx @BBC-Esq) signalprime
5b7a33e3 2024-02-19 13:30:33 Moved the compute_device API to whisperspeech.inference Jakub Piotr Cłapa
86a2d5f4 2024-02-16 22:36:37 added minimal.py signalprime
b89bb082 2024-02-16 18:33:35 CPU + MPS Support Updated signalprime
5f691c6a 2024-02-13 08:38:52 Implement CPU and MPS support enhancements for WhisperSpeech BBC, Esquire
e35ee9ac 2024-02-13 06:01:43 Load reference audio more efficiently (#79) BBC-Esq
80b268b7 2024-02-02 08:20:56 correct small typo in sample text BBC-Esq
ef752dbd 2024-02-02 08:13:36 convert text to an audio file BBC-Esq
1581ae9e 2024-02-02 08:13:08 Create readme.md BBC-Esq
03b8a086 2024-02-02 08:12:32 Delete Examples directory BBC-Esq
ae9c1944 2024-02-02 08:12:04 simple text to audio file script BBC-Esq
cd47f1ec 2024-02-02 08:11:12 Update readme.md BBC-Esq
b176c9fc 2024-02-02 08:10:10 Create readme.md BBC-Esq
ac80e067 2024-01-29 19:43:52 Release 0.6 Jakub Piotr Cłapa
fb471556 2024-01-29 19:30:05 train: experiment with linear LR schedules Jakub Piotr Cłapa
6aea31ae 2024-01-29 19:29:23 train_multi: added support for multiple datasets Jakub Piotr Cłapa
ffc99542 2024-01-29 19:23:45 pipeline: T2S now returns a batched tensor Jakub Piotr Cłapa
12c8a479 2024-01-29 19:19:01 t2s: updated training code, added batch size benchmarking Jakub Piotr Cłapa
1b425498 2024-01-29 19:17:48 Removed old model code Jakub Piotr Cłapa
2ff5fca3 2024-01-29 12:31:40 modules: fixed regressions Jakub Piotr Cłapa
f48cd4a5 2024-01-29 11:48:00 s2a: updated training code, added batch size benchmarking Jakub Piotr Cłapa
1ad4f5dc 2024-01-29 19:22:25 Added the evaluations of the recent vq_stoks models Jakub Piotr Cłapa
b6fc87c9 2024-01-26 23:22:02 Updated the data preprocessing code Jakub Piotr Cłapa
cab55b2c 2024-01-26 23:08:11 Added the languages notebook Jakub Piotr Cłapa
be5eede5 2024-01-26 23:07:24 Added a benchmarking script Jakub Piotr Cłapa
2fb964d4 2024-01-26 23:07:07 wh_transcribe: update the preprocessing code Jakub Piotr Cłapa
be9878cb 2024-01-26 13:59:57 Prefetch more models used in preprocessing Jakub Piotr Cłapa
9d1b39bc 2024-01-26 13:57:57 vad: store VAD results as FP32 Jakub Piotr Cłapa
9b681cf7 2024-01-26 13:57:22 vad: modernize dataloading Jakub Piotr Cłapa
acbeaf29 2024-01-26 13:35:17 vq_stoks: improved the ensure_whisper API Jakub Piotr Cłapa
6901db42 2024-01-26 13:34:07 Added support for the OPUS codec Jakub Piotr Cłapa
638304c4 2024-01-26 13:30:01 Do not track the _modidx.py file Jakub Piotr Cłapa
842f5b86 2024-01-22 14:08:50 Release v0.5.7 Jakub Piotr Cłapa
7f64151f 2024-01-22 14:06:21 Fix speaker_map backwards compatibility Jakub Piotr Cłapa
ef55c210 2024-01-22 14:04:55 Added support for upgrading checkpoints on the fly Jakub Piotr Cłapa
ad0f8c88 2024-01-22 14:03:15 Fixed the PyPI package license Jakub Piotr Cłapa
32940f63 2024-01-21 23:03:59 Correct the spelling of a word 刘悦
a9b855eb 2024-01-19 17:59:04 Added the missing languages.py file Jakub Piotr Cłapa
a4f9c2de 2024-01-18 18:35:03 Update the Collab link to preselect the runtime type with a GPU Jakub Piotr Cłapa
398b8890 2024-01-18 18:23:17 Open up the README with a higher quality sample (thanks londons_explore and stavros) Jakub Piotr Cłapa
5fe67d89 2024-01-18 17:24:52 modules: bias_out needs to be a buffer as well Jakub Piotr Cłapa
158444d3 2024-01-18 16:58:44 Disable torch.compile by default to reduce compatibility issues Jakub Piotr Cłapa
e06530e0 2024-01-18 16:57:31 README: added links to the presentation recordings Jakub Piotr Cłapa
14bdbbab 2024-01-18 13:02:25 Updated the README with more smaples Jakub Piotr Cłapa
ac1fd8c7 2024-01-17 16:01:39 pipeline: added callback support Jakub Piotr Cłapa
58ee2b66 2024-01-17 15:52:27 T2S: added a step callback Jakub Piotr Cłapa
4bb976ef 2024-01-17 12:30:37 Implemented FP16 inference Jakub Piotr Cłapa
28da4aab 2024-01-17 10:19:04 Clean up inference dependencies Jakub Piotr Cłapa
aa480191 2024-01-15 17:43:19 Rewrote the inference for a 10x speedup Jakub Piotr Cłapa
c725a146 2024-01-15 17:26:49 modules: rewrote kv-cache to be compatible with torch.compile Jakub Piotr Cłapa
e6a7fb69 2024-01-13 14:42:46 modules: removed dead code Jakub Piotr Cłapa
b6f9bf3a 2024-01-13 13:48:35 FlexEmbeddings: fix convert_for_eval when frozen_width == width Jakub Piotr Cłapa
24cb4135 2024-01-11 12:52:09 Clear the kv-cache before each generation Jakub Piotr Cłapa
033301e5 2024-01-10 18:42:48 add kv caching makaveli10
68716610 2024-01-10 10:59:48 README: new models, voice cloning Jakub Piotr Cłapa
8168a30f 2024-01-10 10:56:47 Update the voice cloning example to use a public URL Jakub Piotr Cłapa
89295871 2024-01-10 09:44:50 Added a zero-shot voice cloning example Jakub Piotr Cłapa
22c8c303 2024-01-10 09:16:53 Showcase the faster S2A model in the inference notebook Jakub Piotr Cłapa
8ecde95a 2024-01-10 09:11:47 Added support for loading alternative models in Pipeline Jakub Piotr Cłapa
934a67c7 2023-12-10 21:12:04 Updated the README Jakub Piotr Cłapa
5ab82385 2023-12-10 20:37:05 Improve the inference notebook Jakub Piotr Cłapa
8226ef0a 2023-12-10 20:11:04 Release v0.1.0 Jakub Piotr Cłapa
11374dfc 2023-12-10 20:04:44 Brand new release, finally the quality is amazing :) Jakub Piotr Cłapa
e4c49580 2023-10-27 16:55:53 Fix doc generation issues. Jakub Piotr Cłapa
d0b6e59b 2023-10-27 16:44:31 Added more dev deps Jakub Piotr Cłapa
8a9c0c3a 2023-10-27 16:38:14 Added WER metrics code Jakub Piotr Cłapa
a08eb386 2023-10-27 15:31:38 Added dataset documentation Jakub Piotr Cłapa
a7128b04 2023-10-26 09:50:58 Added a block diagram of the WhisperSpeech pipeline Jakub Piotr Cłapa
08366991 2023-10-19 18:12:42 Added the data preparation scripts Jakub Piotr Cłapa
362b7746 2023-10-19 17:41:59 Updated the inference examples Jakub Piotr Cłapa
54ed7bb8 2023-10-19 17:07:47 README: added links to the pretrained models and datasets on Huggingface Jakub Piotr Cłapa
e22cc01b 2023-10-19 16:58:24 Add WhisperX models to the offline downloader Jakub Piotr Cłapa
d3534443 2023-10-19 16:54:55 Added support for training with webdatasets Jakub Piotr Cłapa
d1b4a0b8 2023-10-19 16:50:31 Vocoder: Added support for unbatched inputs Jakub Piotr Cłapa
5e3cda03 2023-10-19 12:47:11 Added the new semantic to acoustic model Jakub Piotr Cłapa
46267567 2023-10-19 07:36:02 Added the new text to semantic (T2S) model Jakub Piotr Cłapa
17daf044 2023-10-18 14:19:09 Added the new, much improved semantic token model with evaluation scripts Jakub Piotr Cłapa
543cbd24 2023-09-22 08:24:27 Added the new VAD and transcription pipelines Jakub Piotr Cłapa
5054a636 2023-07-20 17:55:02 Fixed the Discord badges rendering through Quarto Jakub Piotr Cłapa
d7862402 2023-07-20 17:36:41 Updated the README Jakub Piotr Cłapa
7f34f455 2023-07-20 17:06:20 Removed old code Jakub Piotr Cłapa
a9e49f1e 2023-07-14 23:22:49 Add pip install cell to the inference notebook Jakub Piotr Cłapa
ee3d832a 2023-07-14 23:19:24 Drop the xformers dependency and bump the version to 0.0.3 Jakub Piotr Cłapa
b0692d35 2023-07-14 22:48:45 New T2S hyperparameters Jakub Piotr Cłapa
c4dc7780 2023-07-14 22:47:36 Improved the inference examples Jakub Piotr Cłapa
c86a3e67 2023-07-14 22:43:29 Load Tunables from T2S model files Jakub Piotr Cłapa
26d6f024 2023-07-14 16:00:32 Fixed lr_scale being overwritten by the learning rate scheduler Jakub Piotr Cłapa
c3577a18 2023-07-14 15:59:45 Prepare the T2S model for hyperparam tuning Jakub Piotr Cłapa
006ad423 2023-07-13 17:39:58 Added Vocos support and showcase the complete inference pipeline Jakub Piotr Cłapa
406e2c30 2023-07-13 17:38:21 Added the t2s and s2a μP-based models with inference support Jakub Piotr Cłapa
ffa51f3c 2023-07-13 17:49:47 Misc nbdev cleanup Jakub Piotr Cłapa
dca0556c 2023-07-13 17:48:44 Removed the old stoks+txts extraction code Jakub Piotr Cłapa
fd59e37f 2023-07-13 17:36:12 Remove the old model code Jakub Piotr Cłapa
edf9bddf 2023-07-13 17:33:25 Remove the quality enhancement model code Jakub Piotr Cłapa
1cdcf861 2023-07-13 17:30:00 Lightning: added support for passing in Tunables Jakub Piotr Cłapa
bdb02117 2023-07-13 17:28:35 Lightning: added support for gradient accumulation Jakub Piotr Cłapa
045ea7b7 2023-07-13 17:26:44 Lightning: added support for changing the number of validations per epoch Jakub Piotr Cłapa
42b47fa1 2023-07-13 17:25:15 Misc W&B logging fixes Jakub Piotr Cłapa
08722a7d 2023-07-13 17:24:31 Added support for μP training optimizer adjustments Jakub Piotr Cłapa
ded5a5c5 2023-07-13 17:02:46 Fixed some misc training code bugs Jakub Piotr Cłapa
85d3aade 2023-07-13 16:57:35 Implement hooks needed for doing the μP parametrization Jakub Piotr Cłapa
f0edaf3f 2023-07-13 16:52:49 Added support for using the xformers attention implementation Jakub Piotr Cłapa
c3345cd0 2023-07-13 16:53:45 Remove old files Jakub Piotr Cłapa
a643831c 2023-07-13 13:04:27 Implemented model loading and inference methods for the quantization model Jakub Piotr Cłapa
14e2c8ad 2023-07-10 09:33:05 Merge pull request #22 from mengting7tw/patch-1 Marcus Edel
17d7eb37 2023-07-02 15:42:17 Update README.md Tsai Meng-Ting
072be35b 2023-06-20 17:02:36 Log accuracy curves to W&B Jakub Piotr Cłapa
49d938a5 2023-06-20 17:02:19 Support multi-element batches in the Lightning trainer Jakub Piotr Cłapa
1a528a16 2023-06-20 17:01:40 Added support for gradient clipping Jakub Piotr Cłapa
687dc66f 2023-06-20 17:00:53 Switch from `pct_start` to `warmup_steps` Jakub Piotr Cłapa
623693b5 2023-06-20 16:59:49 Improved the Visual class to allow for more customization Jakub Piotr Cłapa
b9cd5a3c 2023-06-20 16:52:00 Log hyperparameters to W&B Jakub Piotr Cłapa
100b1c5a 2023-06-20 16:47:03 Notebook cleanups Jakub Piotr Cłapa
3a7f8017 2023-06-20 16:35:48 Set some PyTorch performance setting Jakub Piotr Cłapa
fcb8befc 2023-06-20 17:16:49 Remove the old Python model code Jakub Piotr Cłapa
e99b8652 2023-04-29 18:08:56 rename whisper-finetuning makaveli10
82902a69 2023-04-25 22:52:50 Added a bundle of 3 trained A2A codecs to enhance the sound quality (NFY) Jakub Piotr Cłapa
ed97e2a5 2023-04-25 22:48:51 Log the validation loss 10 times per epoch Jakub Piotr Cłapa
23db82cb 2023-04-25 22:48:07 Added support for passing arguments to datasets and models Jakub Piotr Cłapa
19a475dd 2023-04-25 22:46:02 End the LR schedule with 1/25 of the maximum learning rate Jakub Piotr Cłapa
9bb64f3d 2023-04-25 22:26:17 Fix nbdev metadata Jakub Piotr Cłapa
29423365 2023-04-19 11:25:55 Added the preliminary T2S model and new multiGPU training code. Jakub Piotr Cłapa
3323a42f 2023-04-13 14:15:41 Fixed the audio codec in the new samples Jakub Piotr Cłapa
0b8912c8 2023-04-13 14:06:52 Added a new end-to-end TTS sample Jakub Piotr Cłapa
c852e794 2023-04-12 18:50:51 Add pytorch lightning support. Marcus Edel
6a5e7170 2023-04-05 10:57:32 Added samples, Discord links and an invite to collaborate (#13) Jakub Piotr Cłapa
974428e2 2023-04-03 11:14:31 Try a few temperatures when sampling from the model Jakub Piotr Cłapa
3fe0a964 2023-03-31 14:27:22 Added a new model that replaces cross-attention with a sum of resampled features Jakub Piotr Cłapa
bf29970e 2023-03-29 16:06:44 Added the self/cross-attention visualizations Jakub Piotr Cłapa
dbb3ee99 2023-03-29 15:58:49 Added support for skipping repeated semantic tokens Jakub Piotr Cłapa
933df7f7 2023-03-29 13:12:53 Added the missing token-dataset.feather file Jakub Piotr Cłapa
0d91ae6b 2023-03-23 21:26:41 add huggingface results link makaveli10
2c6e395f 2023-03-23 21:23:45 update whisper readme makaveli10
35379cdd 2023-03-29 07:50:12 Trained a new semnatic -> acoustic model that kind of works Jakub Piotr Cłapa
80c6272c 2023-03-29 07:42:53 Updated the tokenizer training setup, added a script to extract stoks. Jakub Piotr Cłapa
49f112a0 2023-03-29 07:40:06 Added support for extracting Whisper embeddings from other encoder layers Jakub Piotr Cłapa
041f805a 2023-03-29 07:38:24 Try to lower acoustic extraction peak GPU memory usage Jakub Piotr Cłapa
20ddd17b 2023-03-23 09:06:11 Merge pull request #6 from makaveli10/main Jakub Piotr Cłapa
e75c3b07 2023-03-22 22:54:21 fix:typo makaveli10
efda8515 2023-03-21 13:37:49 consistent identation makaveli10
8f1ede78 2023-03-21 13:13:18 update train steps makaveli10
8f6a6e7d 2023-03-21 13:09:27 train whisper decoer/encoder from scratch makaveli10
547dede6 2023-03-15 15:55:21 Remove the old VQ model Jakub Piotr Cłapa
06a568ac 2023-03-15 15:52:32 Move the symlink around to fix the README Jakub Piotr Cłapa
b9e065b3 2023-03-15 15:13:50 Added positional embeddings after the RQ bottleneck, updated the training code and model Jakub Piotr Cłapa
46953c29 2023-03-03 22:50:40 Added a symlink to the whisper diagram Jakub Piotr Cłapa
6470d23e 2023-03-03 22:02:55 Initial version of the semantic to acoustic modeling notebook (WIP) Jakub Piotr Cłapa
b7b25ebf 2023-03-03 22:01:54 Added TODO.md with suspected issues to check before training the final models Jakub Piotr Cłapa
770bbcad 2023-03-03 22:00:58 Start using nbdev, create command line tools for token extraction Jakub Piotr Cłapa
7eeef0d8 2023-02-28 09:15:42 Added the preliminary RQ semantic token quantization model Jakub Piotr Cłapa
97ee6508 2023-02-24 17:20:19 Update about the progress on semantic tokens Jakub Piotr Cłapa
62c34bf8 2023-02-24 12:05:05 Added a script to extract embeddings and tokens for distillation training Jakub Piotr Cłapa
8e132eb2 2023-02-24 12:03:15 Added synthetic dataset and VQ/RQ model experiments Jakub Piotr Cłapa
2cf6999a 2023-02-24 12:02:15 Renamed the notebooks with ordinal numbering Jakub Piotr Cłapa
0818b08c 2023-02-23 18:19:47 Added the synthetic semantic embeddings experiment Jakub Piotr Cłapa
dc7a68ed 2023-02-20 16:06:15 A first shot at semantic token extraction Jakub Piotr Cłapa
b15a2719 2023-02-20 15:26:04 Added the acoustic token extraction notebook Jakub Piotr Cłapa
3e321de7 2023-02-16 09:34:38 Expanded the README with more information Jakub Piotr Cłapa
9494ad4e 2023-02-16 09:03:47 Create LICENSE Jakub Piotr Cłapa
514ff0d9 2023-02-14 11:49:19 Initial readme. Marcus Edel
f47b8dc7 2023-02-14 11:47:56 Initial commit Marcus Edel