d02564f5 |
2024-02-18 00:15:27 |
Update gui_file_to_text_to_audio_playback.py |
BBC-Esq |
044947d9 |
2024-02-17 23:02:01 |
gui/user text/file/tts |
BBC-Esq |
c1b64916 |
2024-02-16 22:17:03 |
requirements: added torch >=2, torchaudio, and soundfile (thx @BBC-Esq) |
signalprime |
5b7a33e3 |
2024-02-19 13:30:33 |
Moved the compute_device API to whisperspeech.inference |
Jakub Piotr Cłapa |
86a2d5f4 |
2024-02-16 22:36:37 |
added minimal.py |
signalprime |
b89bb082 |
2024-02-16 18:33:35 |
CPU + MPS Support Updated |
signalprime |
5f691c6a |
2024-02-13 08:38:52 |
Implement CPU and MPS support enhancements for WhisperSpeech |
BBC, Esquire |
e35ee9ac |
2024-02-13 06:01:43 |
Load reference audio more efficiently (#79) |
BBC-Esq |
80b268b7 |
2024-02-02 08:20:56 |
correct small typo in sample text |
BBC-Esq |
ef752dbd |
2024-02-02 08:13:36 |
convert text to an audio file |
BBC-Esq |
1581ae9e |
2024-02-02 08:13:08 |
Create readme.md |
BBC-Esq |
03b8a086 |
2024-02-02 08:12:32 |
Delete Examples directory |
BBC-Esq |
ae9c1944 |
2024-02-02 08:12:04 |
simple text to audio file script |
BBC-Esq |
cd47f1ec |
2024-02-02 08:11:12 |
Update readme.md |
BBC-Esq |
b176c9fc |
2024-02-02 08:10:10 |
Create readme.md |
BBC-Esq |
ac80e067 |
2024-01-29 19:43:52 |
Release 0.6 |
Jakub Piotr Cłapa |
fb471556 |
2024-01-29 19:30:05 |
train: experiment with linear LR schedules |
Jakub Piotr Cłapa |
6aea31ae |
2024-01-29 19:29:23 |
train_multi: added support for multiple datasets |
Jakub Piotr Cłapa |
ffc99542 |
2024-01-29 19:23:45 |
pipeline: T2S now returns a batched tensor |
Jakub Piotr Cłapa |
12c8a479 |
2024-01-29 19:19:01 |
t2s: updated training code, added batch size benchmarking |
Jakub Piotr Cłapa |
1b425498 |
2024-01-29 19:17:48 |
Removed old model code |
Jakub Piotr Cłapa |
2ff5fca3 |
2024-01-29 12:31:40 |
modules: fixed regressions |
Jakub Piotr Cłapa |
f48cd4a5 |
2024-01-29 11:48:00 |
s2a: updated training code, added batch size benchmarking |
Jakub Piotr Cłapa |
1ad4f5dc |
2024-01-29 19:22:25 |
Added the evaluations of the recent vq_stoks models |
Jakub Piotr Cłapa |
b6fc87c9 |
2024-01-26 23:22:02 |
Updated the data preprocessing code |
Jakub Piotr Cłapa |
cab55b2c |
2024-01-26 23:08:11 |
Added the languages notebook |
Jakub Piotr Cłapa |
be5eede5 |
2024-01-26 23:07:24 |
Added a benchmarking script |
Jakub Piotr Cłapa |
2fb964d4 |
2024-01-26 23:07:07 |
wh_transcribe: update the preprocessing code |
Jakub Piotr Cłapa |
be9878cb |
2024-01-26 13:59:57 |
Prefetch more models used in preprocessing |
Jakub Piotr Cłapa |
9d1b39bc |
2024-01-26 13:57:57 |
vad: store VAD results as FP32 |
Jakub Piotr Cłapa |
9b681cf7 |
2024-01-26 13:57:22 |
vad: modernize dataloading |
Jakub Piotr Cłapa |
acbeaf29 |
2024-01-26 13:35:17 |
vq_stoks: improved the ensure_whisper API |
Jakub Piotr Cłapa |
6901db42 |
2024-01-26 13:34:07 |
Added support for the OPUS codec |
Jakub Piotr Cłapa |
638304c4 |
2024-01-26 13:30:01 |
Do not track the _modidx.py file |
Jakub Piotr Cłapa |
842f5b86 |
2024-01-22 14:08:50 |
Release v0.5.7 |
Jakub Piotr Cłapa |
7f64151f |
2024-01-22 14:06:21 |
Fix speaker_map backwards compatibility |
Jakub Piotr Cłapa |
ef55c210 |
2024-01-22 14:04:55 |
Added support for upgrading checkpoints on the fly |
Jakub Piotr Cłapa |
ad0f8c88 |
2024-01-22 14:03:15 |
Fixed the PyPI package license |
Jakub Piotr Cłapa |
32940f63 |
2024-01-21 23:03:59 |
Correct the spelling of a word |
刘悦 |
a9b855eb |
2024-01-19 17:59:04 |
Added the missing languages.py file |
Jakub Piotr Cłapa |
a4f9c2de |
2024-01-18 18:35:03 |
Update the Collab link to preselect the runtime type with a GPU |
Jakub Piotr Cłapa |
398b8890 |
2024-01-18 18:23:17 |
Open up the README with a higher quality sample (thanks londons_explore and stavros) |
Jakub Piotr Cłapa |
5fe67d89 |
2024-01-18 17:24:52 |
modules: bias_out needs to be a buffer as well |
Jakub Piotr Cłapa |
158444d3 |
2024-01-18 16:58:44 |
Disable torch.compile by default to reduce compatibility issues |
Jakub Piotr Cłapa |
e06530e0 |
2024-01-18 16:57:31 |
README: added links to the presentation recordings |
Jakub Piotr Cłapa |
14bdbbab |
2024-01-18 13:02:25 |
Updated the README with more smaples |
Jakub Piotr Cłapa |
ac1fd8c7 |
2024-01-17 16:01:39 |
pipeline: added callback support |
Jakub Piotr Cłapa |
58ee2b66 |
2024-01-17 15:52:27 |
T2S: added a step callback |
Jakub Piotr Cłapa |
4bb976ef |
2024-01-17 12:30:37 |
Implemented FP16 inference |
Jakub Piotr Cłapa |
28da4aab |
2024-01-17 10:19:04 |
Clean up inference dependencies |
Jakub Piotr Cłapa |
aa480191 |
2024-01-15 17:43:19 |
Rewrote the inference for a 10x speedup |
Jakub Piotr Cłapa |
c725a146 |
2024-01-15 17:26:49 |
modules: rewrote kv-cache to be compatible with torch.compile |
Jakub Piotr Cłapa |
e6a7fb69 |
2024-01-13 14:42:46 |
modules: removed dead code |
Jakub Piotr Cłapa |
b6f9bf3a |
2024-01-13 13:48:35 |
FlexEmbeddings: fix convert_for_eval when frozen_width == width |
Jakub Piotr Cłapa |
24cb4135 |
2024-01-11 12:52:09 |
Clear the kv-cache before each generation |
Jakub Piotr Cłapa |
033301e5 |
2024-01-10 18:42:48 |
add kv caching |
makaveli10 |
68716610 |
2024-01-10 10:59:48 |
README: new models, voice cloning |
Jakub Piotr Cłapa |
8168a30f |
2024-01-10 10:56:47 |
Update the voice cloning example to use a public URL |
Jakub Piotr Cłapa |
89295871 |
2024-01-10 09:44:50 |
Added a zero-shot voice cloning example |
Jakub Piotr Cłapa |
22c8c303 |
2024-01-10 09:16:53 |
Showcase the faster S2A model in the inference notebook |
Jakub Piotr Cłapa |
8ecde95a |
2024-01-10 09:11:47 |
Added support for loading alternative models in Pipeline |
Jakub Piotr Cłapa |
934a67c7 |
2023-12-10 21:12:04 |
Updated the README |
Jakub Piotr Cłapa |
5ab82385 |
2023-12-10 20:37:05 |
Improve the inference notebook |
Jakub Piotr Cłapa |
8226ef0a |
2023-12-10 20:11:04 |
Release v0.1.0 |
Jakub Piotr Cłapa |
11374dfc |
2023-12-10 20:04:44 |
Brand new release, finally the quality is amazing :) |
Jakub Piotr Cłapa |
e4c49580 |
2023-10-27 16:55:53 |
Fix doc generation issues. |
Jakub Piotr Cłapa |
d0b6e59b |
2023-10-27 16:44:31 |
Added more dev deps |
Jakub Piotr Cłapa |
8a9c0c3a |
2023-10-27 16:38:14 |
Added WER metrics code |
Jakub Piotr Cłapa |
a08eb386 |
2023-10-27 15:31:38 |
Added dataset documentation |
Jakub Piotr Cłapa |
a7128b04 |
2023-10-26 09:50:58 |
Added a block diagram of the WhisperSpeech pipeline |
Jakub Piotr Cłapa |
08366991 |
2023-10-19 18:12:42 |
Added the data preparation scripts |
Jakub Piotr Cłapa |
362b7746 |
2023-10-19 17:41:59 |
Updated the inference examples |
Jakub Piotr Cłapa |
54ed7bb8 |
2023-10-19 17:07:47 |
README: added links to the pretrained models and datasets on Huggingface |
Jakub Piotr Cłapa |
e22cc01b |
2023-10-19 16:58:24 |
Add WhisperX models to the offline downloader |
Jakub Piotr Cłapa |
d3534443 |
2023-10-19 16:54:55 |
Added support for training with webdatasets |
Jakub Piotr Cłapa |
d1b4a0b8 |
2023-10-19 16:50:31 |
Vocoder: Added support for unbatched inputs |
Jakub Piotr Cłapa |
5e3cda03 |
2023-10-19 12:47:11 |
Added the new semantic to acoustic model |
Jakub Piotr Cłapa |
46267567 |
2023-10-19 07:36:02 |
Added the new text to semantic (T2S) model |
Jakub Piotr Cłapa |
17daf044 |
2023-10-18 14:19:09 |
Added the new, much improved semantic token model with evaluation scripts |
Jakub Piotr Cłapa |
543cbd24 |
2023-09-22 08:24:27 |
Added the new VAD and transcription pipelines |
Jakub Piotr Cłapa |
5054a636 |
2023-07-20 17:55:02 |
Fixed the Discord badges rendering through Quarto |
Jakub Piotr Cłapa |
d7862402 |
2023-07-20 17:36:41 |
Updated the README |
Jakub Piotr Cłapa |
7f34f455 |
2023-07-20 17:06:20 |
Removed old code |
Jakub Piotr Cłapa |
a9e49f1e |
2023-07-14 23:22:49 |
Add pip install cell to the inference notebook |
Jakub Piotr Cłapa |
ee3d832a |
2023-07-14 23:19:24 |
Drop the xformers dependency and bump the version to 0.0.3 |
Jakub Piotr Cłapa |
b0692d35 |
2023-07-14 22:48:45 |
New T2S hyperparameters |
Jakub Piotr Cłapa |
c4dc7780 |
2023-07-14 22:47:36 |
Improved the inference examples |
Jakub Piotr Cłapa |
c86a3e67 |
2023-07-14 22:43:29 |
Load Tunables from T2S model files |
Jakub Piotr Cłapa |
26d6f024 |
2023-07-14 16:00:32 |
Fixed lr_scale being overwritten by the learning rate scheduler |
Jakub Piotr Cłapa |
c3577a18 |
2023-07-14 15:59:45 |
Prepare the T2S model for hyperparam tuning |
Jakub Piotr Cłapa |
006ad423 |
2023-07-13 17:39:58 |
Added Vocos support and showcase the complete inference pipeline |
Jakub Piotr Cłapa |
406e2c30 |
2023-07-13 17:38:21 |
Added the t2s and s2a μP-based models with inference support |
Jakub Piotr Cłapa |
ffa51f3c |
2023-07-13 17:49:47 |
Misc nbdev cleanup |
Jakub Piotr Cłapa |
dca0556c |
2023-07-13 17:48:44 |
Removed the old stoks+txts extraction code |
Jakub Piotr Cłapa |
fd59e37f |
2023-07-13 17:36:12 |
Remove the old model code |
Jakub Piotr Cłapa |
edf9bddf |
2023-07-13 17:33:25 |
Remove the quality enhancement model code |
Jakub Piotr Cłapa |
1cdcf861 |
2023-07-13 17:30:00 |
Lightning: added support for passing in Tunables |
Jakub Piotr Cłapa |
bdb02117 |
2023-07-13 17:28:35 |
Lightning: added support for gradient accumulation |
Jakub Piotr Cłapa |
045ea7b7 |
2023-07-13 17:26:44 |
Lightning: added support for changing the number of validations per epoch |
Jakub Piotr Cłapa |
42b47fa1 |
2023-07-13 17:25:15 |
Misc W&B logging fixes |
Jakub Piotr Cłapa |
08722a7d |
2023-07-13 17:24:31 |
Added support for μP training optimizer adjustments |
Jakub Piotr Cłapa |
ded5a5c5 |
2023-07-13 17:02:46 |
Fixed some misc training code bugs |
Jakub Piotr Cłapa |
85d3aade |
2023-07-13 16:57:35 |
Implement hooks needed for doing the μP parametrization |
Jakub Piotr Cłapa |
f0edaf3f |
2023-07-13 16:52:49 |
Added support for using the xformers attention implementation |
Jakub Piotr Cłapa |
c3345cd0 |
2023-07-13 16:53:45 |
Remove old files |
Jakub Piotr Cłapa |
a643831c |
2023-07-13 13:04:27 |
Implemented model loading and inference methods for the quantization model |
Jakub Piotr Cłapa |
14e2c8ad |
2023-07-10 09:33:05 |
Merge pull request #22 from mengting7tw/patch-1 |
Marcus Edel |
17d7eb37 |
2023-07-02 15:42:17 |
Update README.md |
Tsai Meng-Ting |
072be35b |
2023-06-20 17:02:36 |
Log accuracy curves to W&B |
Jakub Piotr Cłapa |
49d938a5 |
2023-06-20 17:02:19 |
Support multi-element batches in the Lightning trainer |
Jakub Piotr Cłapa |
1a528a16 |
2023-06-20 17:01:40 |
Added support for gradient clipping |
Jakub Piotr Cłapa |
687dc66f |
2023-06-20 17:00:53 |
Switch from `pct_start` to `warmup_steps` |
Jakub Piotr Cłapa |
623693b5 |
2023-06-20 16:59:49 |
Improved the Visual class to allow for more customization |
Jakub Piotr Cłapa |
b9cd5a3c |
2023-06-20 16:52:00 |
Log hyperparameters to W&B |
Jakub Piotr Cłapa |
100b1c5a |
2023-06-20 16:47:03 |
Notebook cleanups |
Jakub Piotr Cłapa |
3a7f8017 |
2023-06-20 16:35:48 |
Set some PyTorch performance setting |
Jakub Piotr Cłapa |
fcb8befc |
2023-06-20 17:16:49 |
Remove the old Python model code |
Jakub Piotr Cłapa |
e99b8652 |
2023-04-29 18:08:56 |
rename whisper-finetuning |
makaveli10 |
82902a69 |
2023-04-25 22:52:50 |
Added a bundle of 3 trained A2A codecs to enhance the sound quality (NFY) |
Jakub Piotr Cłapa |
ed97e2a5 |
2023-04-25 22:48:51 |
Log the validation loss 10 times per epoch |
Jakub Piotr Cłapa |
23db82cb |
2023-04-25 22:48:07 |
Added support for passing arguments to datasets and models |
Jakub Piotr Cłapa |
19a475dd |
2023-04-25 22:46:02 |
End the LR schedule with 1/25 of the maximum learning rate |
Jakub Piotr Cłapa |
9bb64f3d |
2023-04-25 22:26:17 |
Fix nbdev metadata |
Jakub Piotr Cłapa |
29423365 |
2023-04-19 11:25:55 |
Added the preliminary T2S model and new multiGPU training code. |
Jakub Piotr Cłapa |
3323a42f |
2023-04-13 14:15:41 |
Fixed the audio codec in the new samples |
Jakub Piotr Cłapa |
0b8912c8 |
2023-04-13 14:06:52 |
Added a new end-to-end TTS sample |
Jakub Piotr Cłapa |
c852e794 |
2023-04-12 18:50:51 |
Add pytorch lightning support. |
Marcus Edel |
6a5e7170 |
2023-04-05 10:57:32 |
Added samples, Discord links and an invite to collaborate (#13) |
Jakub Piotr Cłapa |
974428e2 |
2023-04-03 11:14:31 |
Try a few temperatures when sampling from the model |
Jakub Piotr Cłapa |
3fe0a964 |
2023-03-31 14:27:22 |
Added a new model that replaces cross-attention with a sum of resampled features |
Jakub Piotr Cłapa |
bf29970e |
2023-03-29 16:06:44 |
Added the self/cross-attention visualizations |
Jakub Piotr Cłapa |
dbb3ee99 |
2023-03-29 15:58:49 |
Added support for skipping repeated semantic tokens |
Jakub Piotr Cłapa |
933df7f7 |
2023-03-29 13:12:53 |
Added the missing token-dataset.feather file |
Jakub Piotr Cłapa |
0d91ae6b |
2023-03-23 21:26:41 |
add huggingface results link |
makaveli10 |
2c6e395f |
2023-03-23 21:23:45 |
update whisper readme |
makaveli10 |
35379cdd |
2023-03-29 07:50:12 |
Trained a new semnatic -> acoustic model that kind of works |
Jakub Piotr Cłapa |
80c6272c |
2023-03-29 07:42:53 |
Updated the tokenizer training setup, added a script to extract stoks. |
Jakub Piotr Cłapa |
49f112a0 |
2023-03-29 07:40:06 |
Added support for extracting Whisper embeddings from other encoder layers |
Jakub Piotr Cłapa |
041f805a |
2023-03-29 07:38:24 |
Try to lower acoustic extraction peak GPU memory usage |
Jakub Piotr Cłapa |
20ddd17b |
2023-03-23 09:06:11 |
Merge pull request #6 from makaveli10/main |
Jakub Piotr Cłapa |
e75c3b07 |
2023-03-22 22:54:21 |
fix:typo |
makaveli10 |
efda8515 |
2023-03-21 13:37:49 |
consistent identation |
makaveli10 |
8f1ede78 |
2023-03-21 13:13:18 |
update train steps |
makaveli10 |
8f6a6e7d |
2023-03-21 13:09:27 |
train whisper decoer/encoder from scratch |
makaveli10 |
547dede6 |
2023-03-15 15:55:21 |
Remove the old VQ model |
Jakub Piotr Cłapa |
06a568ac |
2023-03-15 15:52:32 |
Move the symlink around to fix the README |
Jakub Piotr Cłapa |
b9e065b3 |
2023-03-15 15:13:50 |
Added positional embeddings after the RQ bottleneck, updated the training code and model |
Jakub Piotr Cłapa |
46953c29 |
2023-03-03 22:50:40 |
Added a symlink to the whisper diagram |
Jakub Piotr Cłapa |
6470d23e |
2023-03-03 22:02:55 |
Initial version of the semantic to acoustic modeling notebook (WIP) |
Jakub Piotr Cłapa |
b7b25ebf |
2023-03-03 22:01:54 |
Added TODO.md with suspected issues to check before training the final models |
Jakub Piotr Cłapa |
770bbcad |
2023-03-03 22:00:58 |
Start using nbdev, create command line tools for token extraction |
Jakub Piotr Cłapa |
7eeef0d8 |
2023-02-28 09:15:42 |
Added the preliminary RQ semantic token quantization model |
Jakub Piotr Cłapa |
97ee6508 |
2023-02-24 17:20:19 |
Update about the progress on semantic tokens |
Jakub Piotr Cłapa |
62c34bf8 |
2023-02-24 12:05:05 |
Added a script to extract embeddings and tokens for distillation training |
Jakub Piotr Cłapa |
8e132eb2 |
2023-02-24 12:03:15 |
Added synthetic dataset and VQ/RQ model experiments |
Jakub Piotr Cłapa |
2cf6999a |
2023-02-24 12:02:15 |
Renamed the notebooks with ordinal numbering |
Jakub Piotr Cłapa |
0818b08c |
2023-02-23 18:19:47 |
Added the synthetic semantic embeddings experiment |
Jakub Piotr Cłapa |
dc7a68ed |
2023-02-20 16:06:15 |
A first shot at semantic token extraction |
Jakub Piotr Cłapa |
b15a2719 |
2023-02-20 15:26:04 |
Added the acoustic token extraction notebook |
Jakub Piotr Cłapa |
3e321de7 |
2023-02-16 09:34:38 |
Expanded the README with more information |
Jakub Piotr Cłapa |
9494ad4e |
2023-02-16 09:03:47 |
Create LICENSE |
Jakub Piotr Cłapa |
514ff0d9 |
2023-02-14 11:49:19 |
Initial readme. |
Marcus Edel |
f47b8dc7 |
2023-02-14 11:47:56 |
Initial commit |
Marcus Edel |