Liu Song’s Projects

Hash	Date	Commit message	Author
d02564f5	2024-02-18 00:15:27	Update gui_file_to_text_to_audio_playback.py	BBC-Esq
044947d9	2024-02-17 23:02:01	gui/user text/file/tts	BBC-Esq
c1b64916	2024-02-16 22:17:03	requirements: added torch >=2, torchaudio, and soundfile (thx @BBC-Esq)	signalprime
5b7a33e3	2024-02-19 13:30:33	Moved the compute_device API to whisperspeech.inference	Jakub Piotr Cłapa
86a2d5f4	2024-02-16 22:36:37	added minimal.py	signalprime
b89bb082	2024-02-16 18:33:35	CPU + MPS Support Updated	signalprime
5f691c6a	2024-02-13 08:38:52	Implement CPU and MPS support enhancements for WhisperSpeech	BBC, Esquire
e35ee9ac	2024-02-13 06:01:43	Load reference audio more efficiently (#79)	BBC-Esq
80b268b7	2024-02-02 08:20:56	correct small typo in sample text	BBC-Esq
ef752dbd	2024-02-02 08:13:36	convert text to an audio file	BBC-Esq
1581ae9e	2024-02-02 08:13:08	Create readme.md	BBC-Esq
03b8a086	2024-02-02 08:12:32	Delete Examples directory	BBC-Esq
ae9c1944	2024-02-02 08:12:04	simple text to audio file script	BBC-Esq
cd47f1ec	2024-02-02 08:11:12	Update readme.md	BBC-Esq
b176c9fc	2024-02-02 08:10:10	Create readme.md	BBC-Esq
ac80e067	2024-01-29 19:43:52	Release 0.6	Jakub Piotr Cłapa
fb471556	2024-01-29 19:30:05	train: experiment with linear LR schedules	Jakub Piotr Cłapa
6aea31ae	2024-01-29 19:29:23	train_multi: added support for multiple datasets	Jakub Piotr Cłapa
ffc99542	2024-01-29 19:23:45	pipeline: T2S now returns a batched tensor	Jakub Piotr Cłapa
12c8a479	2024-01-29 19:19:01	t2s: updated training code, added batch size benchmarking	Jakub Piotr Cłapa
1b425498	2024-01-29 19:17:48	Removed old model code	Jakub Piotr Cłapa
2ff5fca3	2024-01-29 12:31:40	modules: fixed regressions	Jakub Piotr Cłapa
f48cd4a5	2024-01-29 11:48:00	s2a: updated training code, added batch size benchmarking	Jakub Piotr Cłapa
1ad4f5dc	2024-01-29 19:22:25	Added the evaluations of the recent vq_stoks models	Jakub Piotr Cłapa
b6fc87c9	2024-01-26 23:22:02	Updated the data preprocessing code	Jakub Piotr Cłapa
cab55b2c	2024-01-26 23:08:11	Added the languages notebook	Jakub Piotr Cłapa
be5eede5	2024-01-26 23:07:24	Added a benchmarking script	Jakub Piotr Cłapa
2fb964d4	2024-01-26 23:07:07	wh_transcribe: update the preprocessing code	Jakub Piotr Cłapa
be9878cb	2024-01-26 13:59:57	Prefetch more models used in preprocessing	Jakub Piotr Cłapa
9d1b39bc	2024-01-26 13:57:57	vad: store VAD results as FP32	Jakub Piotr Cłapa
9b681cf7	2024-01-26 13:57:22	vad: modernize dataloading	Jakub Piotr Cłapa
acbeaf29	2024-01-26 13:35:17	vq_stoks: improved the ensure_whisper API	Jakub Piotr Cłapa
6901db42	2024-01-26 13:34:07	Added support for the OPUS codec	Jakub Piotr Cłapa
638304c4	2024-01-26 13:30:01	Do not track the _modidx.py file	Jakub Piotr Cłapa
842f5b86	2024-01-22 14:08:50	Release v0.5.7	Jakub Piotr Cłapa
7f64151f	2024-01-22 14:06:21	Fix speaker_map backwards compatibility	Jakub Piotr Cłapa
ef55c210	2024-01-22 14:04:55	Added support for upgrading checkpoints on the fly	Jakub Piotr Cłapa
ad0f8c88	2024-01-22 14:03:15	Fixed the PyPI package license	Jakub Piotr Cłapa
32940f63	2024-01-21 23:03:59	Correct the spelling of a word	刘悦
a9b855eb	2024-01-19 17:59:04	Added the missing languages.py file	Jakub Piotr Cłapa
a4f9c2de	2024-01-18 18:35:03	Update the Collab link to preselect the runtime type with a GPU	Jakub Piotr Cłapa
398b8890	2024-01-18 18:23:17	Open up the README with a higher quality sample (thanks londons_explore and stavros)	Jakub Piotr Cłapa
5fe67d89	2024-01-18 17:24:52	modules: bias_out needs to be a buffer as well	Jakub Piotr Cłapa
158444d3	2024-01-18 16:58:44	Disable torch.compile by default to reduce compatibility issues	Jakub Piotr Cłapa
e06530e0	2024-01-18 16:57:31	README: added links to the presentation recordings	Jakub Piotr Cłapa
14bdbbab	2024-01-18 13:02:25	Updated the README with more smaples	Jakub Piotr Cłapa
ac1fd8c7	2024-01-17 16:01:39	pipeline: added callback support	Jakub Piotr Cłapa
58ee2b66	2024-01-17 15:52:27	T2S: added a step callback	Jakub Piotr Cłapa
4bb976ef	2024-01-17 12:30:37	Implemented FP16 inference	Jakub Piotr Cłapa
28da4aab	2024-01-17 10:19:04	Clean up inference dependencies	Jakub Piotr Cłapa
aa480191	2024-01-15 17:43:19	Rewrote the inference for a 10x speedup	Jakub Piotr Cłapa
c725a146	2024-01-15 17:26:49	modules: rewrote kv-cache to be compatible with torch.compile	Jakub Piotr Cłapa
e6a7fb69	2024-01-13 14:42:46	modules: removed dead code	Jakub Piotr Cłapa
b6f9bf3a	2024-01-13 13:48:35	FlexEmbeddings: fix convert_for_eval when frozen_width == width	Jakub Piotr Cłapa
24cb4135	2024-01-11 12:52:09	Clear the kv-cache before each generation	Jakub Piotr Cłapa
033301e5	2024-01-10 18:42:48	add kv caching	makaveli10
68716610	2024-01-10 10:59:48	README: new models, voice cloning	Jakub Piotr Cłapa
8168a30f	2024-01-10 10:56:47	Update the voice cloning example to use a public URL	Jakub Piotr Cłapa
89295871	2024-01-10 09:44:50	Added a zero-shot voice cloning example	Jakub Piotr Cłapa
22c8c303	2024-01-10 09:16:53	Showcase the faster S2A model in the inference notebook	Jakub Piotr Cłapa
8ecde95a	2024-01-10 09:11:47	Added support for loading alternative models in Pipeline	Jakub Piotr Cłapa
934a67c7	2023-12-10 21:12:04	Updated the README	Jakub Piotr Cłapa
5ab82385	2023-12-10 20:37:05	Improve the inference notebook	Jakub Piotr Cłapa
8226ef0a	2023-12-10 20:11:04	Release v0.1.0	Jakub Piotr Cłapa
11374dfc	2023-12-10 20:04:44	Brand new release, finally the quality is amazing :)	Jakub Piotr Cłapa
e4c49580	2023-10-27 16:55:53	Fix doc generation issues.	Jakub Piotr Cłapa
d0b6e59b	2023-10-27 16:44:31	Added more dev deps	Jakub Piotr Cłapa
8a9c0c3a	2023-10-27 16:38:14	Added WER metrics code	Jakub Piotr Cłapa
a08eb386	2023-10-27 15:31:38	Added dataset documentation	Jakub Piotr Cłapa
a7128b04	2023-10-26 09:50:58	Added a block diagram of the WhisperSpeech pipeline	Jakub Piotr Cłapa
08366991	2023-10-19 18:12:42	Added the data preparation scripts	Jakub Piotr Cłapa
362b7746	2023-10-19 17:41:59	Updated the inference examples	Jakub Piotr Cłapa
54ed7bb8	2023-10-19 17:07:47	README: added links to the pretrained models and datasets on Huggingface	Jakub Piotr Cłapa
e22cc01b	2023-10-19 16:58:24	Add WhisperX models to the offline downloader	Jakub Piotr Cłapa
d3534443	2023-10-19 16:54:55	Added support for training with webdatasets	Jakub Piotr Cłapa
d1b4a0b8	2023-10-19 16:50:31	Vocoder: Added support for unbatched inputs	Jakub Piotr Cłapa
5e3cda03	2023-10-19 12:47:11	Added the new semantic to acoustic model	Jakub Piotr Cłapa
46267567	2023-10-19 07:36:02	Added the new text to semantic (T2S) model	Jakub Piotr Cłapa
17daf044	2023-10-18 14:19:09	Added the new, much improved semantic token model with evaluation scripts	Jakub Piotr Cłapa
543cbd24	2023-09-22 08:24:27	Added the new VAD and transcription pipelines	Jakub Piotr Cłapa
5054a636	2023-07-20 17:55:02	Fixed the Discord badges rendering through Quarto	Jakub Piotr Cłapa
d7862402	2023-07-20 17:36:41	Updated the README	Jakub Piotr Cłapa
7f34f455	2023-07-20 17:06:20	Removed old code	Jakub Piotr Cłapa
a9e49f1e	2023-07-14 23:22:49	Add pip install cell to the inference notebook	Jakub Piotr Cłapa
ee3d832a	2023-07-14 23:19:24	Drop the xformers dependency and bump the version to 0.0.3	Jakub Piotr Cłapa
b0692d35	2023-07-14 22:48:45	New T2S hyperparameters	Jakub Piotr Cłapa
c4dc7780	2023-07-14 22:47:36	Improved the inference examples	Jakub Piotr Cłapa
c86a3e67	2023-07-14 22:43:29	Load Tunables from T2S model files	Jakub Piotr Cłapa
26d6f024	2023-07-14 16:00:32	Fixed lr_scale being overwritten by the learning rate scheduler	Jakub Piotr Cłapa
c3577a18	2023-07-14 15:59:45	Prepare the T2S model for hyperparam tuning	Jakub Piotr Cłapa
006ad423	2023-07-13 17:39:58	Added Vocos support and showcase the complete inference pipeline	Jakub Piotr Cłapa
406e2c30	2023-07-13 17:38:21	Added the t2s and s2a μP-based models with inference support	Jakub Piotr Cłapa
ffa51f3c	2023-07-13 17:49:47	Misc nbdev cleanup	Jakub Piotr Cłapa
dca0556c	2023-07-13 17:48:44	Removed the old stoks+txts extraction code	Jakub Piotr Cłapa
fd59e37f	2023-07-13 17:36:12	Remove the old model code	Jakub Piotr Cłapa
edf9bddf	2023-07-13 17:33:25	Remove the quality enhancement model code	Jakub Piotr Cłapa
1cdcf861	2023-07-13 17:30:00	Lightning: added support for passing in Tunables	Jakub Piotr Cłapa
bdb02117	2023-07-13 17:28:35	Lightning: added support for gradient accumulation	Jakub Piotr Cłapa
045ea7b7	2023-07-13 17:26:44	Lightning: added support for changing the number of validations per epoch	Jakub Piotr Cłapa
42b47fa1	2023-07-13 17:25:15	Misc W&B logging fixes	Jakub Piotr Cłapa
08722a7d	2023-07-13 17:24:31	Added support for μP training optimizer adjustments	Jakub Piotr Cłapa
ded5a5c5	2023-07-13 17:02:46	Fixed some misc training code bugs	Jakub Piotr Cłapa
85d3aade	2023-07-13 16:57:35	Implement hooks needed for doing the μP parametrization	Jakub Piotr Cłapa
f0edaf3f	2023-07-13 16:52:49	Added support for using the xformers attention implementation	Jakub Piotr Cłapa
c3345cd0	2023-07-13 16:53:45	Remove old files	Jakub Piotr Cłapa
a643831c	2023-07-13 13:04:27	Implemented model loading and inference methods for the quantization model	Jakub Piotr Cłapa
14e2c8ad	2023-07-10 09:33:05	Merge pull request #22 from mengting7tw/patch-1	Marcus Edel
17d7eb37	2023-07-02 15:42:17	Update README.md	Tsai Meng-Ting
072be35b	2023-06-20 17:02:36	Log accuracy curves to W&B	Jakub Piotr Cłapa
49d938a5	2023-06-20 17:02:19	Support multi-element batches in the Lightning trainer	Jakub Piotr Cłapa
1a528a16	2023-06-20 17:01:40	Added support for gradient clipping	Jakub Piotr Cłapa
687dc66f	2023-06-20 17:00:53	Switch from `pct_start` to `warmup_steps`	Jakub Piotr Cłapa
623693b5	2023-06-20 16:59:49	Improved the Visual class to allow for more customization	Jakub Piotr Cłapa
b9cd5a3c	2023-06-20 16:52:00	Log hyperparameters to W&B	Jakub Piotr Cłapa
100b1c5a	2023-06-20 16:47:03	Notebook cleanups	Jakub Piotr Cłapa
3a7f8017	2023-06-20 16:35:48	Set some PyTorch performance setting	Jakub Piotr Cłapa
fcb8befc	2023-06-20 17:16:49	Remove the old Python model code	Jakub Piotr Cłapa
e99b8652	2023-04-29 18:08:56	rename whisper-finetuning	makaveli10
82902a69	2023-04-25 22:52:50	Added a bundle of 3 trained A2A codecs to enhance the sound quality (NFY)	Jakub Piotr Cłapa
ed97e2a5	2023-04-25 22:48:51	Log the validation loss 10 times per epoch	Jakub Piotr Cłapa
23db82cb	2023-04-25 22:48:07	Added support for passing arguments to datasets and models	Jakub Piotr Cłapa
19a475dd	2023-04-25 22:46:02	End the LR schedule with 1/25 of the maximum learning rate	Jakub Piotr Cłapa
9bb64f3d	2023-04-25 22:26:17	Fix nbdev metadata	Jakub Piotr Cłapa
29423365	2023-04-19 11:25:55	Added the preliminary T2S model and new multiGPU training code.	Jakub Piotr Cłapa
3323a42f	2023-04-13 14:15:41	Fixed the audio codec in the new samples	Jakub Piotr Cłapa
0b8912c8	2023-04-13 14:06:52	Added a new end-to-end TTS sample	Jakub Piotr Cłapa
c852e794	2023-04-12 18:50:51	Add pytorch lightning support.	Marcus Edel
6a5e7170	2023-04-05 10:57:32	Added samples, Discord links and an invite to collaborate (#13)	Jakub Piotr Cłapa
974428e2	2023-04-03 11:14:31	Try a few temperatures when sampling from the model	Jakub Piotr Cłapa
3fe0a964	2023-03-31 14:27:22	Added a new model that replaces cross-attention with a sum of resampled features	Jakub Piotr Cłapa
bf29970e	2023-03-29 16:06:44	Added the self/cross-attention visualizations	Jakub Piotr Cłapa
dbb3ee99	2023-03-29 15:58:49	Added support for skipping repeated semantic tokens	Jakub Piotr Cłapa
933df7f7	2023-03-29 13:12:53	Added the missing token-dataset.feather file	Jakub Piotr Cłapa
0d91ae6b	2023-03-23 21:26:41	add huggingface results link	makaveli10
2c6e395f	2023-03-23 21:23:45	update whisper readme	makaveli10
35379cdd	2023-03-29 07:50:12	Trained a new semnatic -> acoustic model that kind of works	Jakub Piotr Cłapa
80c6272c	2023-03-29 07:42:53	Updated the tokenizer training setup, added a script to extract stoks.	Jakub Piotr Cłapa
49f112a0	2023-03-29 07:40:06	Added support for extracting Whisper embeddings from other encoder layers	Jakub Piotr Cłapa
041f805a	2023-03-29 07:38:24	Try to lower acoustic extraction peak GPU memory usage	Jakub Piotr Cłapa
20ddd17b	2023-03-23 09:06:11	Merge pull request #6 from makaveli10/main	Jakub Piotr Cłapa
e75c3b07	2023-03-22 22:54:21	fix:typo	makaveli10
efda8515	2023-03-21 13:37:49	consistent identation	makaveli10
8f1ede78	2023-03-21 13:13:18	update train steps	makaveli10
8f6a6e7d	2023-03-21 13:09:27	train whisper decoer/encoder from scratch	makaveli10
547dede6	2023-03-15 15:55:21	Remove the old VQ model	Jakub Piotr Cłapa
06a568ac	2023-03-15 15:52:32	Move the symlink around to fix the README	Jakub Piotr Cłapa
b9e065b3	2023-03-15 15:13:50	Added positional embeddings after the RQ bottleneck, updated the training code and model	Jakub Piotr Cłapa
46953c29	2023-03-03 22:50:40	Added a symlink to the whisper diagram	Jakub Piotr Cłapa
6470d23e	2023-03-03 22:02:55	Initial version of the semantic to acoustic modeling notebook (WIP)	Jakub Piotr Cłapa
b7b25ebf	2023-03-03 22:01:54	Added TODO.md with suspected issues to check before training the final models	Jakub Piotr Cłapa
770bbcad	2023-03-03 22:00:58	Start using nbdev, create command line tools for token extraction	Jakub Piotr Cłapa
7eeef0d8	2023-02-28 09:15:42	Added the preliminary RQ semantic token quantization model	Jakub Piotr Cłapa
97ee6508	2023-02-24 17:20:19	Update about the progress on semantic tokens	Jakub Piotr Cłapa
62c34bf8	2023-02-24 12:05:05	Added a script to extract embeddings and tokens for distillation training	Jakub Piotr Cłapa
8e132eb2	2023-02-24 12:03:15	Added synthetic dataset and VQ/RQ model experiments	Jakub Piotr Cłapa
2cf6999a	2023-02-24 12:02:15	Renamed the notebooks with ordinal numbering	Jakub Piotr Cłapa
0818b08c	2023-02-23 18:19:47	Added the synthetic semantic embeddings experiment	Jakub Piotr Cłapa
dc7a68ed	2023-02-20 16:06:15	A first shot at semantic token extraction	Jakub Piotr Cłapa
b15a2719	2023-02-20 15:26:04	Added the acoustic token extraction notebook	Jakub Piotr Cłapa
3e321de7	2023-02-16 09:34:38	Expanded the README with more information	Jakub Piotr Cłapa
9494ad4e	2023-02-16 09:03:47	Create LICENSE	Jakub Piotr Cłapa
514ff0d9	2023-02-14 11:49:19	Initial readme.	Marcus Edel
f47b8dc7	2023-02-14 11:47:56	Initial commit	Marcus Edel

d02564f5

2024-02-18 00:15:27

Update gui_file_to_text_to_audio_playback.py

BBC-Esq

044947d9

2024-02-17 23:02:01

gui/user text/file/tts

BBC-Esq

c1b64916

2024-02-16 22:17:03

requirements: added torch >=2, torchaudio, and soundfile (thx @BBC-Esq)

signalprime

5b7a33e3

2024-02-19 13:30:33

Moved the compute_device API to whisperspeech.inference

Jakub Piotr Cłapa

86a2d5f4

2024-02-16 22:36:37

added minimal.py

signalprime

b89bb082

2024-02-16 18:33:35

CPU + MPS Support Updated

signalprime

5f691c6a

2024-02-13 08:38:52

Implement CPU and MPS support enhancements for WhisperSpeech

BBC, Esquire

e35ee9ac

2024-02-13 06:01:43

Load reference audio more efficiently (#79)

BBC-Esq

80b268b7

2024-02-02 08:20:56

correct small typo in sample text

BBC-Esq

ef752dbd

2024-02-02 08:13:36

convert text to an audio file

BBC-Esq

1581ae9e

2024-02-02 08:13:08

Create readme.md

BBC-Esq

03b8a086

2024-02-02 08:12:32

Delete Examples directory

BBC-Esq

ae9c1944

2024-02-02 08:12:04

simple text to audio file script

BBC-Esq

cd47f1ec

2024-02-02 08:11:12

Update readme.md

BBC-Esq

b176c9fc

2024-02-02 08:10:10

Create readme.md

BBC-Esq

ac80e067

2024-01-29 19:43:52

Release 0.6

Jakub Piotr Cłapa

fb471556

2024-01-29 19:30:05

train: experiment with linear LR schedules

Jakub Piotr Cłapa

6aea31ae

2024-01-29 19:29:23

train_multi: added support for multiple datasets

Jakub Piotr Cłapa

ffc99542

2024-01-29 19:23:45

pipeline: T2S now returns a batched tensor

Jakub Piotr Cłapa

12c8a479

2024-01-29 19:19:01

t2s: updated training code, added batch size benchmarking

Jakub Piotr Cłapa

1b425498

2024-01-29 19:17:48

Removed old model code

Jakub Piotr Cłapa

2ff5fca3

2024-01-29 12:31:40

modules: fixed regressions

Jakub Piotr Cłapa

f48cd4a5

2024-01-29 11:48:00

s2a: updated training code, added batch size benchmarking

Jakub Piotr Cłapa

1ad4f5dc

2024-01-29 19:22:25

Added the evaluations of the recent vq_stoks models

Jakub Piotr Cłapa

b6fc87c9

2024-01-26 23:22:02

Updated the data preprocessing code

Jakub Piotr Cłapa

cab55b2c

2024-01-26 23:08:11

Added the languages notebook

Jakub Piotr Cłapa

be5eede5

2024-01-26 23:07:24

Added a benchmarking script

Jakub Piotr Cłapa

2fb964d4

2024-01-26 23:07:07

wh_transcribe: update the preprocessing code

Jakub Piotr Cłapa

be9878cb

2024-01-26 13:59:57

Prefetch more models used in preprocessing

Jakub Piotr Cłapa

9d1b39bc

2024-01-26 13:57:57

vad: store VAD results as FP32

Jakub Piotr Cłapa

9b681cf7

2024-01-26 13:57:22

vad: modernize dataloading

Jakub Piotr Cłapa

acbeaf29

2024-01-26 13:35:17

vq_stoks: improved the ensure_whisper API

Jakub Piotr Cłapa

6901db42

2024-01-26 13:34:07

Added support for the OPUS codec

Jakub Piotr Cłapa

638304c4

2024-01-26 13:30:01

Do not track the _modidx.py file

Jakub Piotr Cłapa

842f5b86

2024-01-22 14:08:50

Release v0.5.7

Jakub Piotr Cłapa

7f64151f

2024-01-22 14:06:21

Fix speaker_map backwards compatibility

Jakub Piotr Cłapa

ef55c210

2024-01-22 14:04:55

Added support for upgrading checkpoints on the fly

Jakub Piotr Cłapa

ad0f8c88

2024-01-22 14:03:15

Fixed the PyPI package license

Jakub Piotr Cłapa

32940f63

2024-01-21 23:03:59

Correct the spelling of a word

刘悦

a9b855eb

2024-01-19 17:59:04

Added the missing languages.py file

Jakub Piotr Cłapa

a4f9c2de

2024-01-18 18:35:03

Update the Collab link to preselect the runtime type with a GPU

Jakub Piotr Cłapa

398b8890

2024-01-18 18:23:17

Open up the README with a higher quality sample (thanks londons_explore and stavros)

Jakub Piotr Cłapa

5fe67d89

2024-01-18 17:24:52

modules: bias_out needs to be a buffer as well

Jakub Piotr Cłapa

158444d3

2024-01-18 16:58:44

Disable torch.compile by default to reduce compatibility issues

Jakub Piotr Cłapa

e06530e0

2024-01-18 16:57:31

README: added links to the presentation recordings

Jakub Piotr Cłapa

14bdbbab

2024-01-18 13:02:25

Updated the README with more smaples

Jakub Piotr Cłapa

ac1fd8c7

2024-01-17 16:01:39

pipeline: added callback support

Jakub Piotr Cłapa

58ee2b66

2024-01-17 15:52:27

T2S: added a step callback

Jakub Piotr Cłapa

4bb976ef

2024-01-17 12:30:37

Implemented FP16 inference

Jakub Piotr Cłapa

28da4aab

2024-01-17 10:19:04

Clean up inference dependencies

Jakub Piotr Cłapa

aa480191

2024-01-15 17:43:19

Rewrote the inference for a 10x speedup

Jakub Piotr Cłapa

c725a146

2024-01-15 17:26:49

modules: rewrote kv-cache to be compatible with torch.compile

Jakub Piotr Cłapa

e6a7fb69

2024-01-13 14:42:46

modules: removed dead code

Jakub Piotr Cłapa

b6f9bf3a

2024-01-13 13:48:35

FlexEmbeddings: fix convert_for_eval when frozen_width == width

Jakub Piotr Cłapa

24cb4135

2024-01-11 12:52:09

Clear the kv-cache before each generation

Jakub Piotr Cłapa

033301e5

2024-01-10 18:42:48

add kv caching

makaveli10

68716610

2024-01-10 10:59:48

README: new models, voice cloning

Jakub Piotr Cłapa

8168a30f

2024-01-10 10:56:47

Update the voice cloning example to use a public URL

Jakub Piotr Cłapa

89295871

2024-01-10 09:44:50

Added a zero-shot voice cloning example

Jakub Piotr Cłapa

22c8c303

2024-01-10 09:16:53

Showcase the faster S2A model in the inference notebook

Jakub Piotr Cłapa

8ecde95a

2024-01-10 09:11:47

Added support for loading alternative models in Pipeline

Jakub Piotr Cłapa

934a67c7

2023-12-10 21:12:04

Updated the README

Jakub Piotr Cłapa

5ab82385

2023-12-10 20:37:05

Improve the inference notebook

Jakub Piotr Cłapa

8226ef0a

2023-12-10 20:11:04

Release v0.1.0

Jakub Piotr Cłapa

11374dfc

2023-12-10 20:04:44

Brand new release, finally the quality is amazing :)

Jakub Piotr Cłapa

e4c49580

2023-10-27 16:55:53

Fix doc generation issues.

Jakub Piotr Cłapa

d0b6e59b

2023-10-27 16:44:31

Added more dev deps

Jakub Piotr Cłapa

8a9c0c3a

2023-10-27 16:38:14

Added WER metrics code

Jakub Piotr Cłapa

a08eb386

2023-10-27 15:31:38

Added dataset documentation

Jakub Piotr Cłapa

a7128b04

2023-10-26 09:50:58

Added a block diagram of the WhisperSpeech pipeline

Jakub Piotr Cłapa

08366991

2023-10-19 18:12:42

Added the data preparation scripts

Jakub Piotr Cłapa

362b7746

2023-10-19 17:41:59

Updated the inference examples

Jakub Piotr Cłapa

54ed7bb8

2023-10-19 17:07:47

README: added links to the pretrained models and datasets on Huggingface

Jakub Piotr Cłapa

e22cc01b

2023-10-19 16:58:24

Add WhisperX models to the offline downloader

Jakub Piotr Cłapa

d3534443

2023-10-19 16:54:55

Added support for training with webdatasets

Jakub Piotr Cłapa

d1b4a0b8

2023-10-19 16:50:31

Vocoder: Added support for unbatched inputs

Jakub Piotr Cłapa

5e3cda03

2023-10-19 12:47:11

Added the new semantic to acoustic model

Jakub Piotr Cłapa

46267567

2023-10-19 07:36:02

Added the new text to semantic (T2S) model

Jakub Piotr Cłapa

17daf044

2023-10-18 14:19:09

Added the new, much improved semantic token model with evaluation scripts

Jakub Piotr Cłapa

543cbd24

2023-09-22 08:24:27

Added the new VAD and transcription pipelines

Jakub Piotr Cłapa

5054a636

2023-07-20 17:55:02

Fixed the Discord badges rendering through Quarto

Jakub Piotr Cłapa

d7862402

2023-07-20 17:36:41

Updated the README

Jakub Piotr Cłapa

7f34f455

2023-07-20 17:06:20

Removed old code

Jakub Piotr Cłapa

a9e49f1e

2023-07-14 23:22:49

Add pip install cell to the inference notebook

Jakub Piotr Cłapa

ee3d832a

2023-07-14 23:19:24

Drop the xformers dependency and bump the version to 0.0.3

Jakub Piotr Cłapa

b0692d35

2023-07-14 22:48:45

New T2S hyperparameters

Jakub Piotr Cłapa

c4dc7780

2023-07-14 22:47:36

Improved the inference examples

Jakub Piotr Cłapa

c86a3e67

2023-07-14 22:43:29

Load Tunables from T2S model files

Jakub Piotr Cłapa

26d6f024

2023-07-14 16:00:32

Fixed lr_scale being overwritten by the learning rate scheduler

Jakub Piotr Cłapa

c3577a18

2023-07-14 15:59:45

Prepare the T2S model for hyperparam tuning

Jakub Piotr Cłapa

006ad423

2023-07-13 17:39:58

Added Vocos support and showcase the complete inference pipeline

Jakub Piotr Cłapa

406e2c30

2023-07-13 17:38:21

Added the t2s and s2a μP-based models with inference support

Jakub Piotr Cłapa

ffa51f3c

2023-07-13 17:49:47

Misc nbdev cleanup

Jakub Piotr Cłapa

dca0556c

2023-07-13 17:48:44

Removed the old stoks+txts extraction code

Jakub Piotr Cłapa

fd59e37f

2023-07-13 17:36:12

Remove the old model code

Jakub Piotr Cłapa

edf9bddf

2023-07-13 17:33:25

Remove the quality enhancement model code

Jakub Piotr Cłapa

1cdcf861

2023-07-13 17:30:00

Lightning: added support for passing in Tunables

Jakub Piotr Cłapa

bdb02117

2023-07-13 17:28:35

Lightning: added support for gradient accumulation

Jakub Piotr Cłapa

045ea7b7

2023-07-13 17:26:44

Lightning: added support for changing the number of validations per epoch

Jakub Piotr Cłapa

42b47fa1

2023-07-13 17:25:15

Misc W&B logging fixes

Jakub Piotr Cłapa

08722a7d

2023-07-13 17:24:31

Added support for μP training optimizer adjustments

Jakub Piotr Cłapa

ded5a5c5

2023-07-13 17:02:46

Fixed some misc training code bugs

Jakub Piotr Cłapa

85d3aade

2023-07-13 16:57:35

Implement hooks needed for doing the μP parametrization

Jakub Piotr Cłapa

f0edaf3f

2023-07-13 16:52:49

Added support for using the xformers attention implementation

Jakub Piotr Cłapa

c3345cd0

2023-07-13 16:53:45

Remove old files

Jakub Piotr Cłapa

a643831c

2023-07-13 13:04:27

Implemented model loading and inference methods for the quantization model

Jakub Piotr Cłapa

14e2c8ad

2023-07-10 09:33:05

Merge pull request #22 from mengting7tw/patch-1

Marcus Edel

17d7eb37

2023-07-02 15:42:17

Update README.md

Tsai Meng-Ting

072be35b

2023-06-20 17:02:36

Log accuracy curves to W&B

Jakub Piotr Cłapa

49d938a5

2023-06-20 17:02:19

Support multi-element batches in the Lightning trainer

Jakub Piotr Cłapa

1a528a16

2023-06-20 17:01:40

Added support for gradient clipping

Jakub Piotr Cłapa

687dc66f

2023-06-20 17:00:53

Switch from `pct_start` to `warmup_steps`

Jakub Piotr Cłapa

623693b5

2023-06-20 16:59:49

Improved the Visual class to allow for more customization

Jakub Piotr Cłapa

b9cd5a3c

2023-06-20 16:52:00

Log hyperparameters to W&B

Jakub Piotr Cłapa

100b1c5a

2023-06-20 16:47:03

Notebook cleanups

Jakub Piotr Cłapa

3a7f8017

2023-06-20 16:35:48

Set some PyTorch performance setting

Jakub Piotr Cłapa

fcb8befc

2023-06-20 17:16:49

Remove the old Python model code

Jakub Piotr Cłapa

e99b8652

2023-04-29 18:08:56

rename whisper-finetuning

makaveli10

82902a69

2023-04-25 22:52:50

Added a bundle of 3 trained A2A codecs to enhance the sound quality (NFY)

Jakub Piotr Cłapa

ed97e2a5

2023-04-25 22:48:51

Log the validation loss 10 times per epoch

Jakub Piotr Cłapa

23db82cb

2023-04-25 22:48:07

Added support for passing arguments to datasets and models

Jakub Piotr Cłapa

19a475dd

2023-04-25 22:46:02

End the LR schedule with 1/25 of the maximum learning rate

Jakub Piotr Cłapa

9bb64f3d

2023-04-25 22:26:17

Fix nbdev metadata

Jakub Piotr Cłapa

29423365

2023-04-19 11:25:55

Added the preliminary T2S model and new multiGPU training code.

Jakub Piotr Cłapa

3323a42f

2023-04-13 14:15:41

Fixed the audio codec in the new samples

Jakub Piotr Cłapa

0b8912c8

2023-04-13 14:06:52

Added a new end-to-end TTS sample

Jakub Piotr Cłapa

c852e794

2023-04-12 18:50:51

Add pytorch lightning support.

Marcus Edel

6a5e7170

2023-04-05 10:57:32

Added samples, Discord links and an invite to collaborate (#13)

Jakub Piotr Cłapa

974428e2

2023-04-03 11:14:31

Try a few temperatures when sampling from the model

Jakub Piotr Cłapa

3fe0a964

2023-03-31 14:27:22

Added a new model that replaces cross-attention with a sum of resampled features

Jakub Piotr Cłapa

bf29970e

2023-03-29 16:06:44

Added the self/cross-attention visualizations

Jakub Piotr Cłapa

dbb3ee99

2023-03-29 15:58:49

Added support for skipping repeated semantic tokens

Jakub Piotr Cłapa

933df7f7

2023-03-29 13:12:53

Added the missing token-dataset.feather file

Jakub Piotr Cłapa

0d91ae6b

2023-03-23 21:26:41

add huggingface results link

makaveli10

2c6e395f

2023-03-23 21:23:45

update whisper readme

makaveli10

35379cdd

2023-03-29 07:50:12

Trained a new semnatic -> acoustic model that kind of works

Jakub Piotr Cłapa

80c6272c

2023-03-29 07:42:53

Updated the tokenizer training setup, added a script to extract stoks.

Jakub Piotr Cłapa

49f112a0

2023-03-29 07:40:06

Added support for extracting Whisper embeddings from other encoder layers

Jakub Piotr Cłapa

041f805a

2023-03-29 07:38:24

Try to lower acoustic extraction peak GPU memory usage

Jakub Piotr Cłapa

20ddd17b

2023-03-23 09:06:11

Merge pull request #6 from makaveli10/main

Jakub Piotr Cłapa

e75c3b07

2023-03-22 22:54:21

fix:typo

makaveli10

efda8515

2023-03-21 13:37:49

consistent identation

makaveli10

8f1ede78

2023-03-21 13:13:18

update train steps

makaveli10

8f6a6e7d

2023-03-21 13:09:27

train whisper decoer/encoder from scratch

makaveli10

547dede6

2023-03-15 15:55:21

Remove the old VQ model

Jakub Piotr Cłapa

06a568ac

2023-03-15 15:52:32

Move the symlink around to fix the README

Jakub Piotr Cłapa

b9e065b3

2023-03-15 15:13:50

Added positional embeddings after the RQ bottleneck, updated the training code and model

Jakub Piotr Cłapa

46953c29

2023-03-03 22:50:40

Added a symlink to the whisper diagram

Jakub Piotr Cłapa

6470d23e

2023-03-03 22:02:55

Initial version of the semantic to acoustic modeling notebook (WIP)

Jakub Piotr Cłapa

b7b25ebf

2023-03-03 22:01:54

Added TODO.md with suspected issues to check before training the final models

Jakub Piotr Cłapa

770bbcad

2023-03-03 22:00:58

Start using nbdev, create command line tools for token extraction

Jakub Piotr Cłapa

7eeef0d8

2023-02-28 09:15:42

Added the preliminary RQ semantic token quantization model

Jakub Piotr Cłapa

97ee6508

2023-02-24 17:20:19

Update about the progress on semantic tokens

Jakub Piotr Cłapa

62c34bf8

2023-02-24 12:05:05

Added a script to extract embeddings and tokens for distillation training

Jakub Piotr Cłapa

8e132eb2

2023-02-24 12:03:15

Added synthetic dataset and VQ/RQ model experiments

Jakub Piotr Cłapa

2cf6999a

2023-02-24 12:02:15

Renamed the notebooks with ordinal numbering

Jakub Piotr Cłapa

0818b08c

2023-02-23 18:19:47

Added the synthetic semantic embeddings experiment

Jakub Piotr Cłapa

dc7a68ed

2023-02-20 16:06:15

A first shot at semantic token extraction

Jakub Piotr Cłapa

b15a2719

2023-02-20 15:26:04

Added the acoustic token extraction notebook

Jakub Piotr Cłapa

3e321de7

2023-02-16 09:34:38

Expanded the README with more information

Jakub Piotr Cłapa

9494ad4e

2023-02-16 09:03:47

Create LICENSE

Jakub Piotr Cłapa

514ff0d9

2023-02-14 11:49:19

Initial readme.

Marcus Edel

f47b8dc7

2023-02-14 11:47:56

Initial commit

Marcus Edel

Liu Song’s Projects

~/Projects/WhisperSpeech

History