I do think that voice cloning for personal usage has actual genuine uses - in fact there was a relatively interesting news article about a person who was irrevocably losing their voice who had their vocal pattern cloned.
That being said, it does seem a bit bizarre that the repo's home page is proudly trumpeting the ability to co-opt other people's identities without their permission (and yes your unique vocal pattern is definitely part of your identity - I mean it's used in some forms of biometric data). They're doing the project a bit of a disservice.
chefandy 152 days ago [-]
Of course there are legitimate uses, which means everyone should have completely unfettered access and nobody selling it should be responsible for irresponsible users. Personally, I’m sick of the government limiting my artistic freedom because the mediums I use might be misused by a tiny group of bad actors. For example, it’s unnecessarily difficult to source pineapple grenades for my large scale abstract punched tin crafts. The other people who live in my apartment building haven’t complained when I asked if they had a problem with it, so what’s the problem? And when I can get ahold of it, white phosphorous makes a great addition to my annual deep-woods pyrotechnic light shows. I just don’t understand this nanny state garbage.
Grimblewald 152 days ago [-]
Right? I am an avid keeper of terrariums and micro-ecosystems. Government over-reach means I am having a really hard time seeding my anthrax enclosure.
chefandy 151 days ago [-]
Ridiculous! Complete overreach. Be strong, oppressed one. “This too shall pass.”
mensetmanusman 152 days ago [-]
Polonium has useful uses
chefandy 151 days ago [-]
If nothing else, I can confirm it’s delicious.
notpachet 152 days ago [-]
Take my upvote you greedy bastard.
chefandy 151 days ago [-]
Taken, as recommended. It tingled.
VPenkov 152 days ago [-]
It does have actual genuine uses. I'm in the process of recording a series of tutorials for my peers but I'd like them to hear things in my voice so it doesn't sound like I have offloaded the work to someone else.
I don't know if this helps or harms the credibility but I can't really talk more than an hour without seriously straining my voice. So cloning it sounds like a great use-case for someone with a similar problem.
Looking forward to trying this.
vunderba 152 days ago [-]
I like this idea. I've been playing with the idea of having all my blog entries have corresponding narration with my own voice but I'd love to see some kind of voice cloner + gradio interface that let's me make some adjustments to things like cadence, delivery, etc. (I mean beyond just making me sound like Alvin and the Chipmunks).
TravisPeacock 151 days ago [-]
I don't know about changing tone but I have used Adobe Podcast editor and it allows you adjust the words and rearrange what you said so you can cut "umms" and stuff. I know they are constantly adding features so I don't know if you can improve cadence and stuff but worth looking at if you have adobe stuff
drillsteps5 152 days ago [-]
Wondercraft.ai It's not mine, just used it for a bit few months ago.
dotancohen 151 days ago [-]
> so it doesn't sound like I have offloaded the work to someone else.
So, deception. Deception that you feel is justified, but deception nonetheless.
VPenkov 150 days ago [-]
I disagree. Deception is the act of convincing one in untrue information.
The information I'm conveying is truthful and it's my words. The voice, generated or not, is not what I'm trying to convince people into believing.
dotancohen 150 days ago [-]
That is an interesting perspective. I disagree that there is no deception, but do see the validity in your point. Thank you.
NoMoreNicksLeft 152 days ago [-]
When my IoT geiger counter starts going off, I do what the in-home PA system's voice to be Admiral Adama warning my family of an imminent radiological threat, and preparing the Vipers for launch.
Edward James Olmos if you're reading this, I'm willing to pay a license fee, but then I expect actual recordings and not just AI bullshit. I'm not pirating your voice, you're refusing to let me hire it.
onetokeoverthe 152 days ago [-]
proudly trumpeting the ability to co-opt other people's identities without their permission
EXACTLY. Clone the wrong person's voice and it's game over.
satvikpendem 152 days ago [-]
It's useful for some things, like satire. Presidents Play is a good series in YouTube where it uses US presidents' cloned voices for comedic satire.
bbarnett 152 days ago [-]
A gun is useful to shoot someone, what has that to do with it being right or wrong?
satvikpendem 152 days ago [-]
Not sure you picked the most cogent example because lots of people will debate you on that topic...
ranger_danger 152 days ago [-]
Randy Travis also used AI on his last album after losing his voice.
bguberfain 152 days ago [-]
Thanks for sharing this! But I have some doubts about hidden installation procedures. It imports all functions from one_click (from one_click import *), which points to a compiled file. It then runs functions like install_webui and install_extra_packages. At least suspicious.
wutwutwat 152 days ago [-]
> Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.
Exactly, and this isn't adding anything significant from what I can see that isn't already achieved in much more clear and openly presented repositories. Take coqui for example. Cloning as as easy as recording an example of your voice and using
```python
from TTS.api import TTS
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2", gpu=True)
# generate speech by cloning a voice using default settings
tts.tts_to_file(text="It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
file_path="output.wav",
speaker_wav="/path/to/target/ speaker.wav",
language="en")
```
nmstoker 152 days ago [-]
Perhaps I'm paranoid but this has multiple red flags that make hesitate to install - even the "too good to be true" aspect of such comprehensive features makes me wonder (which is probably irrational and taking it a bit too far!)
lysace 152 days ago [-]
I have resorted to using separate physical computers + vlan network separation when exploring untrusted AI workloads. Yes, it costs, but so does a breach.
Try recording the installation process with a camera. The entire installation process is displayed in the Windows command. It's just installing Python packages and downloading the AI model and audio files. That's all.
bguberfain 152 days ago [-]
The file I mentioned is just the begining... there is a folder full of .dll files, renamed to .pyd.
I understand that this is the proprietary part, that limits usage for 30 minutes, but I think it is too closed for a MIT license.
didibus 152 days ago [-]
Pretty easy for a script to not print everything it does at the command line. You have to inspect the code if you want to be sure.
shannifin 153 days ago [-]
I don't have much real use for celebrity voices (other than fun experimentation), but I'd love to be able to clone my own voice and character voices for the purposes of creating audiobooks / audioplays without having to pay monthly fees with monthly usage limits. So I'm excited by this sort of project!
P.S. Are there any tools for synthetic voice creation? Maybe melding two or more voices together, or just exploring latent space? Would be fun for character creation to create completely new voices.
vunderba 153 days ago [-]
I'd be interested as well. This is where I imagine the space is going - particularly as the potential for litigation increases around cloning.
Game studios will spin up a bunch of unique virtual voices for all the dialogue of extras. It'll probably be longer before we see replacements of main characters though. There's been some research in speech-to-speech transference as well - this means that company employee A records the character B's line with the appropriate emotional nuance (angry, sad, etc.) and the emotional aspect is copied on top of the generated TTS.
thelittleone 153 days ago [-]
Have you tried eleven labs? I used that. Had to record 3 hours of training audio reading books and and news articles. But the result was really good.
shannifin 153 days ago [-]
They're great! They just cost too much for how much output I want.
stavros 153 days ago [-]
How much did the training cost?
dyauspitr 153 days ago [-]
I’ve used tortoise tts before and trained it on my voice and a mix of voices. It’s not perfect but still impressive.
jerpint 152 days ago [-]
StyleTTSv2 is pretty good and open source, you can easily traverse its latent space for voice
__jonas 151 days ago [-]
Similarly, I’m not excited by “voice cloning” at all, but I’d like to have very high quality, natural sounding TTS.
All of the projects that do that seem to be geared towards also allowing arbitrary voice cloning based on short audio clips I’ve noticed.
deskr 152 days ago [-]
Isn't it funny how some text changes the voice in your head? Now you're hearing the best voice. It's amazing. I tell you. It's the greatest voice. Everybody’s talking about it. They are saying it's incredible. They say they've never heard as beautiful a voice before.
bitwize 152 days ago [-]
When Arnold Schwarzenegger was governor of California, he refused clemency for notorious gang founder Stanley "Tookie" Williams, who was sentenced to death for four murders in 1979.
Reading over the governor's statement explaining his reasons for denial of clemency, my brain couldn't help but do so in an Arnold voice. Sometimes, to amuse friends, I would read portions of it aloud while doing the voice.
Maybe it's a bit tasteless, like the anime-girl Demon Core memes, but there's just something about hearing the legal and administrative justification for proceeding with an execution in the voice of the Terminator.
I'm the same way with famous YouTubers. If I see "Guru Larry" Bundy Jr. or Clint "LGR" Basinger leave a comment on someone else's video, my brain reads it in their voice.
cies 152 days ago [-]
I needed until "Everybody’s talking about it" to hear it in his voice :)
Please no spoilers!
amazingamazing 152 days ago [-]
Voices can be beautiful.
wutwutwat 152 days ago [-]
> Windows Defender may give a warning about untrusted application and disallow further execution of Voice-Pro. If SmartScreen security level is set to "Warn", just click "More info" and then click "Run anyway". If SmartScreen is set to level "Block" there will be no button to run the installation. In this case, open the properties of the start.bat file, and check "Unblock", apply the change and run the start.bat again.
hard pass and anyone who reads this and continues is bonkers
HPsquared 152 days ago [-]
Doesn't that basically apply to all binary executables? Anything new and unrecognised by the scanner.
wutwutwat 152 days ago [-]
There's also the fact that there's a load of precompiled binary files in the app directory https://github.com/abus-aikorea/voice-pro/tree/main/app - sure, they might be the binaries from compiling the source code you see in the repo, or they might be something else. Roll the dice.
vulcanidic 152 days ago [-]
Running a Python application using a Windows batch file is not a special task at all. Oobabooga and AUTOMATIC1111 work in the same way. They also have the same issues regarding Windows Defender.
They are complaining about the binary files, not the batch files.
vulcanidic 152 days ago [-]
This application is executed in a virtual environment (venv) created using Miniconda, independent of the Windows OS.
It does not damage the Windows OS.
If you have concerns or doubts about telemetry or spyware, there are countless software options available for detection.
Give it a try.
nulld3v 152 days ago [-]
Python venvs are not intended to provide isolation from the host system and therefore do not provide any isolation from the host system.
So yes, the app can certainly harm the OS, and the venv would not provide any protection against this.
wutwutwat 152 days ago [-]
I don't have concerns, because I won't be running this code
twojacobtwo 152 days ago [-]
I'm legitimately wondering what you recommend from among those options.
dotancohen 151 days ago [-]
This is absolutely not true. In fact, considering the deception of this post, if the person making this post is associated with the project then the project should be considered malware.
giarc 152 days ago [-]
My neighbour is a detective and did a course on crypto scams. He told me scammers call someone's cell phone, record their voicemail greeting and use that to clone their voice. Then can then have a very real life conversation with their grandparent and take their money.
I'm all for innovation, but I don't really see the use case of cloning random voices to make podcasts? Listening to Zuck interview Elon? ok...?
alias_neo 152 days ago [-]
It's really easy for a technical person to do as well.
I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.
Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.
I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".
Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can clone any voice to a reasonably good accuracy with just a few seconds of audio.
The best thing about crypto is that it is an ever growing bug bounty program for all aspects of authentication :)
eurekin 152 days ago [-]
Technically, wouldn't a simple "Hold on, I'll call you back" test call stop that?
a2128 152 days ago [-]
Scammers will use pressure and emotion. "Grandpa they put me in jail, I need you to bail me out please, there's not much time!" The last thing on the victim's mind is to hang up on what sounds like their crying distressed grandson to call them back. Sometimes even calling back won't work, the real grandson isn't picking up their phone and the scammer is saying that it's because they're in jail and their phone was taken.
botanical76 152 days ago [-]
I've been thinking a lot about this possibility. I think people will have to come up with family passwords eventually. A word or phrase that is regularly practised, but strictly private, for verification in times of crisis.
For example, my family's passphrase is- just kidding.
hollerith 152 days ago [-]
Either than or Android and iOS will add something like Caller ID but with actual authentication.
notpachet 152 days ago [-]
My family already does this.
ssl-3 151 days ago [-]
Mmm. Safewords.
wat10000 152 days ago [-]
[flagged]
pmarreck 152 days ago [-]
[flagged]
wat10000 152 days ago [-]
[flagged]
pmarreck 152 days ago [-]
I read the entire thing. My point was that mental capacity and acuity (including the ability to resist panic scamming) go down especially after 80, and gullibility goes up. No matter how smart you were prior. And that this is a vulnerability for everyone. In 2023, Americans aged 60 and older reported losses exceeding $3.4 billion due to scams, marking an 11% increase from the previous year. The median loss in a romance scam for ages 70 and older is $9,000. And now there's also pig-butchering. When combined with isolation and a lack of digital literacy, everyone's older family members are vulnerable.
wat10000 152 days ago [-]
And my point is that coming up with ways for a fully calm, collected, reasonable person to detect a scammer is a waste of time, because that’s the easy part. The hard part is being calm, collected, and reasonable enough to actually consider that you might be getting scammed.
And that is hard. For most people, extremely hard. For people who lived most of their lives before the era of cheap and fast worldwide communication, it’s even worse. For people with declining mental abilities, it’s worse yet. Saying “oh you can expose the scam just by saying you’ll call them back” is looking at the wrong thing.
So yeah, you’re making a great point that meshes well with my own and definitely does not deserve to be wrapped in snideness.
pmarreck 151 days ago [-]
ah. ok. I apologize for misinterpreting. You're saying that because these scams use irrational techniques, there is no point to expecting "reasonableness" to work?
wat10000 151 days ago [-]
Exactly. Scam protection needs to focus on staying calm and being open to the possibility of a scam. Once you’re in a good mental state and the thought has occurred, “hey, this might be a scam,” then you’ve basically already won. Finding a question that can authenticate the purported family member is trivial by comparison.
152 days ago [-]
stitched2gethr 152 days ago [-]
Yes, if the callee has reason to believe the caller isn't who they say they are. But this will never enter the mind of someone who's retirement age.
bagels 152 days ago [-]
Some old people become very gullible.
Loughla 152 days ago [-]
In all fairness, the number of old people who even know that realistic recreations of their loved ones voices is even possible is probably pretty low.
yawnxyz 153 days ago [-]
> When Windows Defender mistakenly recognizes a [virus] as a Trojan, this is often called a 'False Positive'. To solve this problem, you can go through the following steps:
wutwutwat 152 days ago [-]
Not to mention a directory full of binaries which could do who knows what. The author is asking people turn off their antivirus, execute their code as admin, and be fine with it running binary files doing whatever
Yeah I also noticed the install instructions is run this batch file that gets administrator access and starts downloading things…
gruez 153 days ago [-]
It's not any worse than all the projects on github with an "easy" install instructions of "curl ... | sudo sh". Heck, even an innocent "sudo make install" command can easily contain a malicious payload.
tonyedgecombe 153 days ago [-]
It's not really the sort of tool that should require admin rights though.
wutwutwat 152 days ago [-]
Not to mention a directory full of binaries which could do who knows what. The author is asking people turn off their antivirus, execute their code as admin, and be fine with it running binary files doing whatever
If it requires dependencies, how else do you expect it to work?
tonyedgecombe 148 days ago [-]
Vendoring.
elif 152 days ago [-]
Yea not to mention the entire homebrew ecosystem is built around trusting random people's shell scripts.
MacOS devs blindly trust it like it's the app store.
nozzlegear 152 days ago [-]
The assumption is that maintainers at Homebrew are reviewing each pull request before being merged, though it's obviously not a full security audit. Homebrew will also use macOS's sandboxing if a formula needs to be built during installation, which will limit file access to specific Homebrew directories and restrict network access.
But I agree that everyone should review the Homebrew install script for any package they're installing if they're concerned about security.
pmarreck 152 days ago [-]
A simple `brew cat <packagename>` (possibly piping to bat if you want syntax highlighting) should spit out the ruby install formula for that package, for inspection.
chefandy 153 days ago [-]
Yeah it’s not great but it’s definitely not unusual. And windows reputation-based execution blocking does have false positives. I work for a company that has some very very popular products and some that only see a few dozen downloads per week, and despite being signed, it still takes a while for new versions to build enough rep to not trigger the block.
youngNed 152 days ago [-]
I'm looking down the comments, but not really seeing much about what this actually is, by my very quick look, it's a front end for f5-tts with a yt-dlp and whisper?
Is there anything new in this?
mensetmanusman 152 days ago [-]
It’s wrappers all the way down
dangoodmanUT 152 days ago [-]
Yeah they made an easy to use frontend. Don't be the dropbox guy
Uehreka 152 days ago [-]
We can't just keep saying "Don't be the dropbox guy" as a comeback to criticism of new technology. Anyone who uses that phrase should have to place a bet in a prediction market that only pays out if the product they're talking about succeeds. Blindly supporting stuff out of a sort of "Pascal's Wager against looking foolish later" should have some cost if you're wrong.
bn-l 152 days ago [-]
Let’s default to being supportive and very careful with being negative.
Uehreka 152 days ago [-]
That kind of imbalance makes it easier for scammers and hucksters to get away with things. It is not a feelgood prescription with no cost.
bn-l 148 days ago [-]
This is another cost of scamming: the cynicism it creates.
vulcanidic 152 days ago [-]
I completely agree with you. This is just a web front-end, and there's nothing new about it. However, it's very easy. It's not easy to create something like this.
youngNed 152 days ago [-]
Wind your neck in.
I simply asked "is there anything new in this?" because, i was interested to know if, you know, there was anything new in this.
muglug 153 days ago [-]
These tools make it very easy to scam vulnerable people, and have pretty limited use otherwise.
Larrikin 153 days ago [-]
I'm absolutely using celebrity voices for my Home Assistant voice. Amazon has spent the last couple years removing the voices for Alexa that people had paid for.
nickthegreek 152 days ago [-]
I’d love some more info on using custom voices in HA. I have an esp32-s3-box that I am setting up holiday to do voice with HA.
pmarreck 152 days ago [-]
If you have a how—to, I’d love to work on one for my home. I feel like this is all right around the corner…
chefandy 153 days ago [-]
To be fair, they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.
whaaaaat 153 days ago [-]
> they’ve got pretty serious potential for letting tech companies get paid for a seasoned voice actor’s unique delivery, tone, inflection, etc rather than the voice actor themselves.
I think you mean "steal the labor of an actor"?
chefandy 153 days ago [-]
Sure, and people that already agree with you will feel good reading it, but other people who don’t agree see it as an attack. It’s pretty much impossible to slip a new idea into someone’s mind if your approach made them slam the door before even considering it. So what’s the benefit of saying it like that?
gmueckl 153 days ago [-]
It calls attention to the ethical implications of using a part of someone else's personal identity without their direct involvement.
chefandy 152 days ago [-]
So does what I said. Someone taking pay for someone else’s work is pretty unambiguously shitty. But when you call taking anything that isn’t a physical item theft, a large percentage of people— especially in the ‘data wants to be free’ crowd— will roll their eyes, think “that’s ridiculous... they aren’t stealing anything. That voice actor still has their voice” and just stop listening. The only people that feel the impact of statements like that are people that already agree. It turns it from an intellectual discussion to a reinforcement of existing tribes. Divisive language works for rallying those who already agree around a specific cause but it’s not even useless— it’s counterproductive— for changing people’s minds. When’s the last time someone you disagreed with changed your mind by being more aggressive towards your stance, and more terse in their portrayal of the dichotomy? If you can even think of one time that it has, you’re in the extreme minority.
MrDrMcCoy 153 days ago [-]
Indirect involvement can still be ok within the confines of a license agreement for using the actor's voice.
gmueckl 152 days ago [-]
But this requires a legal framework that mandates such licenses and effective emforcement / procecution of violations.
As far as I know, most countries are lagging behind when it comes to updating legislation to set binding rules around that.
ideashower 152 days ago [-]
> Indirect involvement can still be ok within the confines of a license agreement for using the actor's voice.
This assumes existence of a license agreement or likeness/right of publicity law that prevents unauthorized use. But this is far from the case.
Companies have shown willingness to use actors’ voices to create synthetic voices without permission, compensation, or regard for their livelihoods. [1][2][3]
Of course we need laws in place to require such licensing. The fact that people are having their voice stolen now does not mean that there should never be a case where a voice can legally be cloned and used by a third party.
ideashower 150 days ago [-]
Precisely. We must recognize this as a fundamental issue of workers’ rights and personal autonomy in the digital age, beyond viewing it as a technical challenge. Without proper protections, voice cloning technology risks concentrating power in large companies and undermining creative workers’ economic security.
mistercow 152 days ago [-]
It’s weird to me that people look at a technology and then assume from their reckoning that they know all the uses for that technology immediately. Most technological progress happens because someone notices a creative use for something that already exists which nobody else has noticed.
casey2 153 days ago [-]
I like tools like these cause they make zero trust default even more obvious, and their "pretty limited use" is saving people hours of work.
anonzzzies 153 days ago [-]
They are pretty good for leaving messages for my blind friend. I generally find calling / voice texts a waste of time (I type and read far faster than I talk or listen, not to mention the ability to reread etc), but my blind friend prefers getting voice messages when on his phone and this works for us. I type and send and when he comes back with something, Whisper makes it into text for me.
chefandy 153 days ago [-]
Gen AI space to everyone else: “Your computer scientists were so preoccupied with whether or not they should, they didn’t stop to think if they could just do it anyway”
ranger_danger 153 days ago [-]
How many victims will it take for lawmakers to do something about this?
tiborsaas 153 days ago [-]
It's already illegal to scam somebody. While it's always positive to protect people more, what can be done here? Any alternative I can imagine is massively oppressive of the current state of the software industry.
You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.
You essentially have to regulate access to computing power if you want to prevent bad actors doing bad things using these sort of tools.
bryanrasmussen 153 days ago [-]
>You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.
Regulation is putting legal limitations on things, if it is impossible to regulate free and open source tools then it would be impossible to regulate murder and lots of other things, but it turns out it isn't impossible, sure - murder happens - but people get caught for it and punished.
Sorry, but this argument is much like the early internet triumphalism - back when people said it was impossible to regulate. Turns out lots of countries now regulate it.
tiborsaas 153 days ago [-]
It depends on what you do with the tool. Going with your murder analogy, if there's a stabbing epidemic what do you do? 1) Ban knives 2) invest in public safety 3) investigate the root causes and improve on them?
I'm also not sure what's so regulated about the internet besides net neutrality in certain countries. Of course the government can put limits on the network, like banning services, but it's easy since they are rather easy to target. With content traveling on the network it's much harder to say if it's legit or not.
> lots of countries
What about those countries that don't regulate it and people will keep pumping out better, leaner and faster models from there? Spreading software is trivial, all you achieve is the public won't be aware of what's possible.
The more I think about it if anything should be regulated that's a requirement to provide third party (probably government backed) ID verification system so it would be possible for my mom to know it's me calling here. Basically kill called ID spoofing.
bryanrasmussen 152 days ago [-]
>I'm also not sure what's so regulated about the internet besides net neutrality in certain countries.
generally things are regulated on the internet that were not going to ever be regulated because it was on the internet - example - sales taxes, perhaps you are old enough to remember when sales tax collection would not ever be enforceable on internet transactions - those idiot lawyer don't know, it's on the internet, the sale didn't happen in that country or in that state no sales taxes will never happen on the internet hah hah. It's unenforceable, it is logically undoable, there are so many edge cases - ugh, the law just does not understand technology!
oops, sales taxes now on internet purchases.
GDPR is another example of things that are regulated on the internet that basically most of HN years before it happened was completely convinced would be impossible!!
If this thing becomes too big a problem for the societies regulations will be done, with varying levels of effectiveness I'm sure.
And then in twenty years time we will be saying what, you can't regulate genital eating viral synths because a guy can make those in his garage and spread them via nasal spray, this technology is unstoppable and unregulatable, not like some open source deepfake library!!
bavell 152 days ago [-]
It's always amusing listening to techies' musings on law... lots of misunderstandings, I suspect due to the helpful but inaccurate "code but for humans" analogy.
Lots of countries impose exactly what specific regulations with respect to open source tooling?
The closest thing I can think of is maybe the regulation of DRM ripping tools, but they're still out there in the wild and determined actors can easily get ahold of them. So I'm not at all confident that regulation will have any measurable meaningful effect.
bryanrasmussen 152 days ago [-]
>Lots of countries impose exactly what specific regulations with respect to open source tooling?
that something is not currently regulated does not mean it can never be regulated, further it does not seem likely that they would regulate open source tooling but rather some uses and if they open source tooling allowed those uses then what would happen is -
github and other big sources of code would refuse to host it as containing not legally allowed things, so for example if they regulated it in the U.S then Github stops allowing it, and everyone moves to some European git provider.
At the same time bigger companies will stop using the library because liability.
Europe then regulates and can't be in European git repos.. at some point many devs abandon particular library because not worth it (I get it this is actually for the love of doing the illegal thing so they won't abandon but despite the power of love most things in this world do not actually run on it)
Can determined actors get ahold of them and do the things with them the law forbids them to do, sure! That's called crime. Then law enforcement catches determined actors and puts them in prison, that's called the real world!
Will criminals stop - nope because there is benefit to what they're doing. Maybe some will stop because they will think screw it I can make more money working for the man. And some will be caught sooner or later. And maybe in version two of the regulations there will be AI enhancements - this crime was committed with AI allowing us to take all your belongings and add 10 years to your sentence and deprive you of the right to ever own a computing device again...etc. etc. And some people will stop and others will get more violent and aggressive about their criminal business.
I don't know necessarily what measurable meaningful effect means, for some people it will be measurable and meaningful, for some not, for some of society the regulation would in many ways be worse than what it is fighting against. I'm not saying regulation will solve problems 100%, I'm just saying this whole they can't regulate us thing because "TECH!!!" that developers seem to regularly go through with anything they set their eye on is a pipe dream.
notTooFarGone 153 days ago [-]
The fable of the "determined actor".
The "determined actor" can get bombs, tanks, fissure material. There noone says "WHELP they can get it anyway so why bother regulating it LMAO" - somehow this is different in anything not physical?
mnau 152 days ago [-]
> impossible to regulate free and open source tools
BS. Can you imagine a legislation? Yes, thus it can be done.
As an early example, the CRA (Cyber Resilience Act) already contains provisions about open source stewards and security. So far they are legal persons, aka foundations, but could easily relate to any contributor or maintainer.
tiborsaas 152 days ago [-]
As I made the comment, I can't really imagine anything that's not so absurd that has a more than zero chance of happening.
Seriously, what can anybody do about random hacker Joe publishing under the name XoX? Even if they burn GitHub and friends to the ground, if something is useful it will be really really hard to get rid of it. Remember youtube-dl? It's now https://github.com/yt-dlp/yt-dlp
If they make anything that cripples open source development they will feel it quite soon when they realize that it also cripples their world as much of the tooling and infrastructure also depends on it.
Killing open source is like killing the internet itself.
mnau 152 days ago [-]
Consequences never stopped anyone.
Your example with yt-dl doesn't matter.
Open source/free software inherently relies on copyright and all state legal infrastructure. Once you operate outside, it's no longer open source/free software.
Can you host software in a way that's really hard to block? Sure. There is onion routing and plethora of other options.
But that's no longer open source/free software. You are in a realm of dark web and marketplaces.
I do maintain a semi popular open source project that I took over after about a year of inactivity and I seriously considered quiting because of CRA. It's quite easy to cripple/kill something when it basically runs on volunteering of your free time.
yyuugg 153 days ago [-]
[dead]
russell_h 153 days ago [-]
Serious question: what do you think lawmakers should do?
ideashower 152 days ago [-]
For people's image being used without their permission: strengthen U.S. right of publicity laws with private right of action, enabling people to sue for unauthorized use of their voice or likeness.
ranger_danger 152 days ago [-]
Digital signatures as part of audio/video that can't be easily modified or faked which can trace the origin of a piece of media. Some camera manufacturers are already working on it.
CamperBob2 152 days ago [-]
How do you propose to keep watermark-free models out of the hands of evildoers? I can't build my own digital camera or laser printer, but I can certainly write software.
ranger_danger 152 days ago [-]
I don't have a good solution, but maybe legislation helps. There may not be a foolproof solution but I think the more that such devices are widely used, the less likelihood there may be of e.g. a court case hinged on bad evidence.
123yawaworht456 153 days ago [-]
how many victims did it take for lawmakers to do something about Photoshop/GIMP/etc?
tsujamin 153 days ago [-]
Bulldozing grandma is just the cost of technological progress /s
weq 153 days ago [-]
This tech is not only great for bulldozing grandma, its great at stealing content from other creators and rebranding it as your own. Based on the github, it kind of seems like thats exactly whats being advertised as the use case. Steal content from BBC, cut it up and pull the noise out/vocals/revoice the content so the algorithm cant detect the plagorism easily. The imagine detection is no where no the audio detection for copyright strikes.
There is a massive problem with this on youtube. Pretty much every category on youtube now has a host of these bots trolling content and playing the youtube strike system like a banjo. There are channels detected to showing you how to setup these content mills. This tool can make you good money.
sfjailbird 152 days ago [-]
First generative AI destroyed Google search, and now it has pretty much destroyed YouTube. Social platforms, including this one, are probably goners too. We live in interesting times.
uh_uh 153 days ago [-]
This tech is going to be ubiquitous, it's just too easy to distribute it. Grandma better starts adapting now.
thejazzman 153 days ago [-]
Because people make it so, not because the natural order of the world gets us there
For some reason because we can validates that we should. Any jackass has the power of a research team of phds. It's kinda weird.
chefandy 153 days ago [-]
Indeed. Humans ascended to dominance because we can cooperate. This every-man-for-themself idea is an aberration, not the natural order as so many claim. It’s rather astounding to think otherwise considering the logistics of how we’re communicating right now.
uh_uh 153 days ago [-]
Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.
chefandy 153 days ago [-]
> Cooperation works if the potential damage caused by a rouge actor is sufficiently low. Otherwise, it's too easy to sabotage things. This is why we don't want random rouge states to have nukes. AI will give so much leverage to rouge actors that it will significantly shift the game theory in favour of not cooperating.
Governments successfully collectively controlling dangerous things so they don’t fall into the hands of rogue bad actors fundamentally opposes the extreme individualist every-man-for-himself perspective in every conceivable way. It’s the absolute opposite of “it’s everybody’s responsibility to protect themselves because everybody else is only going to look out for themselves.”
And when individuals have that much leverage, collective action is the only conceivable way to oppose it. Some of those things might be cultural, like mores, some might be laws, some might be more martial. I don’t see how extreme individualism even theoretically could be more powerful.
uh_uh 153 days ago [-]
Are you suggesting government action against putting up code like this to GitHub? It’s ok if you are, but I want to put into more concrete terms what we’re talking about.
chefandy 152 days ago [-]
You’re the one that made the direct government control analogy. I mentioned a number of non-individualistic mechanisms in my previous comment. I’m not going to keep engaging in a fishing expedition of things to argue about — I think it’s pretty clear what aspect of your stance I disagree with— and am going to leave it at that.
uh_uh 152 days ago [-]
So you don't have a concrete suggestion to solve the scamming problem?
uh_uh 153 days ago [-]
Demanding responsible behaviour from everybody is not going to work. Some people don't care about negative externalities that much and it's enough if only a few of them decide not to play ball. So either grandma needs to adapt which will upset some people or distributing the tech should be regulated/prosecuted which will upset another group of people.
rockemsockem 153 days ago [-]
I think either way grandma needs to adapt though since Russian scammers and trolls are still going to run scams with fake voices.
123yawaworht456 153 days ago [-]
how very politically correct of you to pretend it's Russians who scam your grandmas
rockemsockem 150 days ago [-]
Insert any other country you like that doesn't have extradition agreements with the United States. Any other country the law can at least ostensibly be enforced there, even if it isn't always
chefandy 153 days ago [-]
You can’t adapt around brain age making it more difficult to distinguish truth from lies.
casey2 153 days ago [-]
Yeah, I don't really get the hulabaloo, if granny doesn't have the mental fortitude to keep up with the times she shouldn't be managing her own money. I guess better her son/daughter than a scammer but both are better than letting money rot. Put granny on foodstamps and pay $1 for her rent controled housing be done with it.
zelphirkalt 153 days ago [-]
Are we forgetting, that there are many elderly people without living descendants?
rockemsockem 153 days ago [-]
Quit being a doomer or keep it to yourself. This reminds me of the sound boards that were popular in the early 2000s except way more versatile. Some things are just good for people to have fun and THAT'S OKAY.
whaaaaat 153 days ago [-]
People are allowed to recognize the realistic negative outcomes of technology, especially on a forum that frequently discusses the tradeoffs of modern, cutting edge technologies.
rockemsockem 153 days ago [-]
So many AI posts are overrun with this kind of complaining from folks with limited imaginations.
On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.
Mordisquitos 153 days ago [-]
I would argue that being able to see the drawbacks and potential negative externalities of a new technology is not a sign of a "limited imagination", but quite the contrary. An actual display of a limited imagination is the inability to imagine how a new technology can (and will) be abused in society by bad actors.
Ukv 152 days ago [-]
Developing some insight on its negative potential could demonstrate imagination, but the claim that it could be used to scam people is pretty much just rote repetition by now - an obligatory point made in every article and under every post about this tech (and not something that I think actually works out in practice the way most imagine it, since cold-call scam operations that dial numbers at a huge scale expecting most not to pick up can't really find a voice clip prior to each automated call).
As for positive applications, some I see:
* Allowing those with speech impairments to communicate using their natural voice again
* Allowing those uncomfortable with their natural voice, such as transgender people, to communicate closer to how they wish to be perceived
* Translation of a user's voice, maintaining emotion and intonation, for natural cross-language communication on calls
* Professional-quality audio from cheap microphone setups (for video tutorials, indie games, etc.)
* Doing character voices for a D&D session, audiobook, etc.
* Customization of voice assistants, such as to use a native accent/dialect
* Movies, podcasts, audiobooks, news broadcasts, etc. made available in a huge range of languages
* If integrated with something like airpods, babelfish-like automatic isolation and translation of any speech around you
* Privacy from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only
* New forms of interactive media - customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prerecorded lines, etc.
* And of course: memes, satire, and parody
I appreciate HN's general view on technologies like encrypted messaging - not falling into "we need to ban this now because pedophiles could use it" hysteria. But for anything involving machine learning, I'm concerned how often the hacker mentality seems to go out the window and we instead get people advocating for it to be made illegal to host the code, for instance.
Mordisquitos 152 days ago [-]
Of the 11 positive applications that you listed, only the 1st, 3rd, 11th and arguably the 4th would benefit from voice cloning, which is what's being promoted here. The rest are solved merely by (improved) TTS and do not require the cloning of any actual human voice.
Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine. However, the only use-case which would require cloning a specific human voice belonging to a third party, use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.
Ukv 152 days ago [-]
> Of the 11 positive applications that you listed, only the 1st, 3rd, 11th and arguably the 4th would benefit from voice cloning, which is what's being promoted here. The rest are solved merely by (improved) TTS and do not require the cloning of any actual human voice.
2, 5, 6, 9: It's true that in theory all you need is some way to capture the characteristics of a desired voice, but voice-cloning methods are the way to do this currently. If you want a voice assistant with a native accent, you fine-tune on the voice of a native speaker - opposed to turning a bunch of dials manually.
7, 8, 10: Here I think there is benefit specifically from sounding like a particular person. The dynamically generated lines of movie characters/videogame NPCs should be consistent with the actor's pre-recorded lines, for instance, and hearing someone in their own voice is more natural for communication and makes conversation easier to follow.
Pedantically, what's promoted here is a tool which features voice cloning prominently but not exclusively - other workflows demonstrated (like generating subtitles) seem mostly unobjectionable.
> Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine
I think all, outside of potentially 8 and 11, could be done with full consent of the voice being cloned - an agreement with the movie actor to use their voice for dubbing to other languages, for example. That's already a significant number of use-cases for this tool.
> use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.
IMO prohibition around satire/parody would be the slippery slope, particularly with the potential for selective enforcement.
rockemsockem 149 days ago [-]
This is a GitHub repo, not an article on the effects of TTS. Policy discussions at the level of the parent comment feel off topic.
wingworks 153 days ago [-]
Just heads up, this is a trail, you have to pay to use it after 30mins..
Easier and (cheaper?) to just use elevenlabs.
vulcanidic 153 days ago [-]
It’s a bit of a hassle, but after closing the Windows command, you can restart the program and use it indefinitely. The results you worked on will still remain in the workspace folder.
ldoughty 152 days ago [-]
Yeah, felt like it positions itself as open source project here and on GitHub, but buries the cost in other pages... Doesn't even say the subscription cost anywhere I could find (in English). Not a huge fan of this advertising model.
jamesy0ung 152 days ago [-]
I haven’t looked at the code, but can you just patch out the 30 minute limit?
batch12 152 days ago [-]
Looks to me like the app code is compiled into pyd files. One could try and decompile. Interestingly, it's licensed as MIT.
jncfhnb 153 days ago [-]
Is there speech to speech? I have been hoping for a model I can use to do voice acting with inflection
amrrs 153 days ago [-]
Do you mean Inflection's Pi?
bryanrasmussen 153 days ago [-]
I think they mean speech "in the style of" the same as repaint this picture in the style of Van Gogh, so they will do the audio and put the correct inflection on things but then rerender it with the voice of Katharine Hepburn for example.
on edit: example of course showing the difficulty as so much of Hepburn was her inflection.
jncfhnb 152 days ago [-]
More so I wish to voice act a line and then have the bot mimic it with a different voice but with the same contextual voicing.
“I’m going to kill you” could be delivered (laughing jokingly / seething with rage / ominously and creepily). I’d like a bot that can mimic the delivery in a different voice.
safeimp 153 days ago [-]
Project looks interesting. Are there short term plans to support MacOS?
If not, any recommendations for alternative projects?
sroussey 152 days ago [-]
The description, since many commentators are not clicking though but asking questions this answers:
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.
OceanBreeze77 152 days ago [-]
Are banks moving away from voice verification as a means to identity checks? It seems like it's getting easier and easier to clone voices.
harryf 153 days ago [-]
Have you considered supporting whisper-at - https://github.com/YuanGongND/whisper-at ? Being able to identify sounds on a timeline can be useful e.g. politicians speech and how the audience is reacting to it (e.g. clapping, applauding)
XorNot 153 days ago [-]
The real utility of something like this is for reducing the creative costs of voice-acting. i.e. something like this is a massive boone for mod-makers where making fully voiced anything is a huge undertaking - i.e. while my friends and family could probably provide their voice if I asked, getting a decent recording and performance out of them is just not going to be possible.
But if I can get the performance I want and shift it to another voice, then fully voicing free works becomes very accessible (even better would be generative AI which could take a sample of what you want and re-render it into something which sounds like a more professional performance - voice in-fill I suppose).
grahamgooch 153 days ago [-]
Great stuff well done. What is your latency for real time Audio?
patrickhogan1 152 days ago [-]
This is cool. I want to use this combined with NotebookLM to create a podcast with my mom and my dad’s voice covering a concept like gradient descent, Explain Like I’m 5 (ELI5).
I wonder if certain familiar voices like that of your parents would lead to higher understanding and retention.
whaaaaat 153 days ago [-]
> Imagine creating a podcast where Mark Zuckerberg interviews Elon Musk – using their actual voices?
I'm imagining it. It sucks to imagine.
I'm imagining it being used to scam people. I'm imagining it to leech off of performers who have worked very hard to build a recognizable voice (and it is a lot of work to speak like a performer). I'm imagining how this will be used in revenge porn. I'm imagining how this will be used to circumvent access to voice controlled things.
This is bad. You should feel bad.
And I know you are thinking, "Wait, but I worked really hard on this!" Sorry, I appreciate that it might be technically impressive, but you've basically come out with "we've invented a device that mixes bleach and ammonia automatically in your bedroom! It's so efficient at mixing those two, we can fill a space with chlorine gas in under 10 seconds! Imagine a world where every bedroom could become a toxic site with only the push of a button.
That this is posted here, proudly, is quite frankly astoundingly embarrassing for you.
Ukv 152 days ago [-]
I'd claim the way most people imagine it being used for scamming, cold-calls impersonating someone the victim knows, doesn't really end up working out in practice because scam operations dial numbers at a huge scale expecting most not to pick up a "scam likely" call (or be away, or a dead number, etc.). Having to find a voice clip prior to each unanswered call would tank the quantity they're able to make.
For spear-phishing (impersonate CEO, tell assistant to transfer money) it's more feasible, but I hope it forces acceptance that "somebody sounds like X over the phone" is not and has never been a good verification method - people have been falling for scams like those fake ransom calls[0] for decades.
Not that there aren't potential harms, but I think they're outweighed by positive applications. Those uncomfortable with their natural voice, such as transgender people, can communicate closer to how they wish to be perceived - or someone whose voice has been impaired (whether just a temporary cold or a permanent disorder/illness/accident) can use it from previous recordings. Privacy benefits from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only. There's huge potential in the translation and vocal isolation aspects aiding communication - feels to me as though we're heading towards creating our own babelfish. There's also a bunch of creative applications - doing character voices for a D&D session or audiobook, memes/satire, and likely new forms of interactive media (customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prereccorded lines, etc.)
I think most people in America are more wary of foreign sounding voices. If the person on the other end sounds like a good ol boy, they get more trust.
Scammers don't have to sound like a specific person to be helped by software like this.
Ukv 152 days ago [-]
That aspect feels to me like "I used to racially profile people on the street to judge risk, but winter clothing now obscures skin color at a distance". There are heuristics that give non-zero information but are harmful to use, with the cost borne by some marginalized group, and I don't see it as a negative for use of such heuristics to be made less feasible. Reducing people's use of accent as a factor would be a positive for the ~1.5B Indians that aren't scammers, for instance.
I think there's also an autonomy argument to be made, if the alternative is to the effect of ensuring that people cannot use tools hide their accent (and particularly if, as above, the intent is so they can be discriminated against based on it). Even though it isn't something we've really been able to do before, I think it's generally a person's own right to modify their voice.
farzd 153 days ago [-]
You do realise this is not the first AI release to clone voices?
yyuugg 153 days ago [-]
I don't think the parent said they were. "I'm the Nth person to do a shitty thing!" doesn't absolve them of doing a shitty thing. Just because there are other thieves doesn't make theft ok.
cess11 153 days ago [-]
Sure, and PoisonIvy wasn't the first RAT. So what? Does it get more ethical to assist fraudsters and so on once more people are doing it?
trallnag 153 days ago [-]
[flagged]
Hard_Space 152 days ago [-]
This doesn't appear to have any training facility, so its misuse would seem to be limited to the pre-trained voices supplied - for the casual user (and the ease-of-use seems to be the central issue in these comments).
throwaway314155 152 days ago [-]
My experience with voice cloning is that training is typically not required for it to work. You just embed a bit of audio of the desired voice to be cloned using the backing VAE and the model can do the rest.
The syncing of the original English is way off. I don't really know how they got that to be so broken.
ilrwbwrkhv 153 days ago [-]
There are a bunch of yc start-ups who are building new models and stuff in the space. I fear they are going to get decimated really soon as the quality of local llamas keep improving.
morkalork 152 days ago [-]
Just need to use this with some recordings of Majel Barrett, make a voice interface for Claude's computer use agent and we'll be all set.
owlbynight 152 days ago [-]
Honestly, I'm not super worried about AI — at least this iteration of it — because of the uncanny valley effect. I would expect the VO industry to outlaw it purely because if people start to wonder if they're listening to an AI voice, that's a non-starter and they will stop paying attention. Even with the best AI, there are artifacts that make it easy to identify.
The primary goal of the voice actor is to achieve a personal connection, and I don't see how AI is a real threat to that end. I feel the same about other mediums as well. This will likely be used for scams, but I doubt it will ever draw as many eyes, or ears, as something a real human can produce. Thus, it won't be a valuable tool to marketers and largely unprofitable.
joshdavham 153 days ago [-]
Looks cool! Also, is there a reason you went with a Web-UI instead of making a native desktop app?
152 days ago [-]
tgv 152 days ago [-]
I'm with the nay-sayers. Your product doesn't bring any good to this world, but it does make it easier to harm people. It's a disgrace.
Ylpertnodi 152 days ago [-]
"If, by whiskey...."
newusertoday 153 days ago [-]
are there any TTS models which are decent but can work on devices without GPU and have relatively low RAM(4GB)
pmarreck 152 days ago [-]
> Linux and Mac OS are not supported
Well, that's a big old fail. Just a reminder: The given (and proper) home of open source is on an open source OS.
totallymike 152 days ago [-]
This is gross. The person who made it and pitched it this way disgusts me.
Reaganpacco 151 days ago [-]
[dead]
wisdomalfred 147 days ago [-]
[dead]
aboardRat4 153 days ago [-]
Without Linux support it is going to have a very limited audience.
okwhateverdude 153 days ago [-]
There is nothing in here that precludes you from running this on any OS that supports python + CUDA. They use miniconda for installation of python and python packages, but this could just as easily be a venv + system CUDA install or even better: a container. This is only one tiny Dockerfile away from running anywhere.
153 days ago [-]
pc-zor_504 153 days ago [-]
[flagged]
Rendello 153 days ago [-]
You don't want an Instagram hack app. You want to go home and rethink your life.
https://www.voanews.com/a/illness-took-away-her-voice-ai-cre...
That being said, it does seem a bit bizarre that the repo's home page is proudly trumpeting the ability to co-opt other people's identities without their permission (and yes your unique vocal pattern is definitely part of your identity - I mean it's used in some forms of biometric data). They're doing the project a bit of a disservice.
I don't know if this helps or harms the credibility but I can't really talk more than an hour without seriously straining my voice. So cloning it sounds like a great use-case for someone with a similar problem.
Looking forward to trying this.
The information I'm conveying is truthful and it's my words. The voice, generated or not, is not what I'm trying to convince people into believing.
Edward James Olmos if you're reading this, I'm willing to pay a license fee, but then I expect actual recordings and not just AI bullshit. I'm not pirating your voice, you're refusing to let me hire it.
EXACTLY. Clone the wrong person's voice and it's game over.
https://github.com/abus-aikorea/voice-pro?tab=readme-ov-file...
clear as day, do not trust this code
Thanks for raising this aspect.
Btw https://github.com/haimgel/display-switch helps a lot.
P.S. Are there any tools for synthetic voice creation? Maybe melding two or more voices together, or just exploring latent space? Would be fun for character creation to create completely new voices.
Game studios will spin up a bunch of unique virtual voices for all the dialogue of extras. It'll probably be longer before we see replacements of main characters though. There's been some research in speech-to-speech transference as well - this means that company employee A records the character B's line with the appropriate emotional nuance (angry, sad, etc.) and the emotional aspect is copied on top of the generated TTS.
https://www.ocregister.com/2005/12/12/governors-full-stateme...
Reading over the governor's statement explaining his reasons for denial of clemency, my brain couldn't help but do so in an Arnold voice. Sometimes, to amuse friends, I would read portions of it aloud while doing the voice.
Maybe it's a bit tasteless, like the anime-girl Demon Core memes, but there's just something about hearing the legal and administrative justification for proceeding with an execution in the voice of the Terminator.
I'm the same way with famous YouTubers. If I see "Guru Larry" Bundy Jr. or Clint "LGR" Basinger leave a comment on someone else's video, my brain reads it in their voice.
Please no spoilers!
https://github.com/abus-aikorea/voice-pro?tab=readme-ov-file...
hard pass and anyone who reads this and continues is bonkers
https://github.com/oobabooga/text-generation-webui https://github.com/AUTOMATIC1111/stable-diffusion-webui
If you have concerns or doubts about telemetry or spyware, there are countless software options available for detection. Give it a try.
So yes, the app can certainly harm the OS, and the venv would not provide any protection against this.
I'm all for innovation, but I don't really see the use case of cloning random voices to make podcasts? Listening to Zuck interview Elon? ok...?
I use Coqui TTS[0] as part of my home automation, I wrote a small python script that lets me upload a voice clip for it to clone after I got the idea from HeyWillow[1], and a small shim that lets me send the output to a Home Assistant media player instead of using their standard output device. I run the TTS container on a VM with a Tesla P4 (~£100 to buy) and get about 1x-2x (roughly the same time it'd take to say it, to process) using the large model.
Just for a giggle, I uploaded a few 3s-5s second clip of myself speaking and cloned my voice, then executed a command to our living room media player to call my wife into the room; from another room, she was 100% convinced it was myself speaking words I'd never spoken.
I tried playing with a variety of sentences for a few hours and overall, it sounded almost exactly like me, to me, with the exception of some "attitude" and "intonation" I know I wouldn't use in my speech. I didn't notice much of an improvement using much longer clips; the short ones were "good enough".
Tangentially, it really bugs me that most phone providers in the UK insist you record a "personal greeting" now before they'll let you check your voice mail box, I just record silence, because the last thing I want/need is a voicemail greeting in my voice confirming to some randomer I didn't want calling me, who I am and that my number is active, even more so knowing how I can clone any voice to a reasonably good accuracy with just a few seconds of audio.
[0] https://github.com/coqui-ai/TTS [1] https://heywillow.io/
For example, my family's passphrase is- just kidding.
And that is hard. For most people, extremely hard. For people who lived most of their lives before the era of cheap and fast worldwide communication, it’s even worse. For people with declining mental abilities, it’s worse yet. Saying “oh you can expose the scam just by saying you’ll call them back” is looking at the wrong thing.
So yeah, you’re making a great point that meshes well with my own and definitely does not deserve to be wrapped in snideness.
https://github.com/abus-aikorea/voice-pro/tree/main/app
https://github.com/abus-aikorea/voice-pro/tree/main/app
MacOS devs blindly trust it like it's the app store.
But I agree that everyone should review the Homebrew install script for any package they're installing if they're concerned about security.
Is there anything new in this?
I simply asked "is there anything new in this?" because, i was interested to know if, you know, there was anything new in this.
I think you mean "steal the labor of an actor"?
As far as I know, most countries are lagging behind when it comes to updating legislation to set binding rules around that.
This assumes existence of a license agreement or likeness/right of publicity law that prevents unauthorized use. But this is far from the case.
Companies have shown willingness to use actors’ voices to create synthetic voices without permission, compensation, or regard for their livelihoods. [1][2][3]
[1] https://animehunch.com/popular-japanese-voice-actors-band-to...
[2] https://www.theatlantic.com/technology/archive/2024/05/eleve...
[3] https://www.yahoo.com/entertainment/morgan-freeman-calls-una...
You can regulate large companies, you can regulate published software sold for profit, but it's impossible to regulate free and open source tools.
You essentially have to regulate access to computing power if you want to prevent bad actors doing bad things using these sort of tools.
Regulation is putting legal limitations on things, if it is impossible to regulate free and open source tools then it would be impossible to regulate murder and lots of other things, but it turns out it isn't impossible, sure - murder happens - but people get caught for it and punished.
Sorry, but this argument is much like the early internet triumphalism - back when people said it was impossible to regulate. Turns out lots of countries now regulate it.
I'm also not sure what's so regulated about the internet besides net neutrality in certain countries. Of course the government can put limits on the network, like banning services, but it's easy since they are rather easy to target. With content traveling on the network it's much harder to say if it's legit or not.
> lots of countries
What about those countries that don't regulate it and people will keep pumping out better, leaner and faster models from there? Spreading software is trivial, all you achieve is the public won't be aware of what's possible.
The more I think about it if anything should be regulated that's a requirement to provide third party (probably government backed) ID verification system so it would be possible for my mom to know it's me calling here. Basically kill called ID spoofing.
generally things are regulated on the internet that were not going to ever be regulated because it was on the internet - example - sales taxes, perhaps you are old enough to remember when sales tax collection would not ever be enforceable on internet transactions - those idiot lawyer don't know, it's on the internet, the sale didn't happen in that country or in that state no sales taxes will never happen on the internet hah hah. It's unenforceable, it is logically undoable, there are so many edge cases - ugh, the law just does not understand technology!
oops, sales taxes now on internet purchases.
GDPR is another example of things that are regulated on the internet that basically most of HN years before it happened was completely convinced would be impossible!!
If this thing becomes too big a problem for the societies regulations will be done, with varying levels of effectiveness I'm sure.
And then in twenty years time we will be saying what, you can't regulate genital eating viral synths because a guy can make those in his garage and spread them via nasal spray, this technology is unstoppable and unregulatable, not like some open source deepfake library!!
Obligatory/relevant xkcd: https://xkcd.com/538/
The closest thing I can think of is maybe the regulation of DRM ripping tools, but they're still out there in the wild and determined actors can easily get ahold of them. So I'm not at all confident that regulation will have any measurable meaningful effect.
that something is not currently regulated does not mean it can never be regulated, further it does not seem likely that they would regulate open source tooling but rather some uses and if they open source tooling allowed those uses then what would happen is -
github and other big sources of code would refuse to host it as containing not legally allowed things, so for example if they regulated it in the U.S then Github stops allowing it, and everyone moves to some European git provider.
At the same time bigger companies will stop using the library because liability.
Europe then regulates and can't be in European git repos.. at some point many devs abandon particular library because not worth it (I get it this is actually for the love of doing the illegal thing so they won't abandon but despite the power of love most things in this world do not actually run on it)
Can determined actors get ahold of them and do the things with them the law forbids them to do, sure! That's called crime. Then law enforcement catches determined actors and puts them in prison, that's called the real world!
Will criminals stop - nope because there is benefit to what they're doing. Maybe some will stop because they will think screw it I can make more money working for the man. And some will be caught sooner or later. And maybe in version two of the regulations there will be AI enhancements - this crime was committed with AI allowing us to take all your belongings and add 10 years to your sentence and deprive you of the right to ever own a computing device again...etc. etc. And some people will stop and others will get more violent and aggressive about their criminal business.
I don't know necessarily what measurable meaningful effect means, for some people it will be measurable and meaningful, for some not, for some of society the regulation would in many ways be worse than what it is fighting against. I'm not saying regulation will solve problems 100%, I'm just saying this whole they can't regulate us thing because "TECH!!!" that developers seem to regularly go through with anything they set their eye on is a pipe dream.
The "determined actor" can get bombs, tanks, fissure material. There noone says "WHELP they can get it anyway so why bother regulating it LMAO" - somehow this is different in anything not physical?
BS. Can you imagine a legislation? Yes, thus it can be done.
As an early example, the CRA (Cyber Resilience Act) already contains provisions about open source stewards and security. So far they are legal persons, aka foundations, but could easily relate to any contributor or maintainer.
Seriously, what can anybody do about random hacker Joe publishing under the name XoX? Even if they burn GitHub and friends to the ground, if something is useful it will be really really hard to get rid of it. Remember youtube-dl? It's now https://github.com/yt-dlp/yt-dlp
If they make anything that cripples open source development they will feel it quite soon when they realize that it also cripples their world as much of the tooling and infrastructure also depends on it.
Killing open source is like killing the internet itself.
Your example with yt-dl doesn't matter.
Open source/free software inherently relies on copyright and all state legal infrastructure. Once you operate outside, it's no longer open source/free software.
Can you host software in a way that's really hard to block? Sure. There is onion routing and plethora of other options.
But that's no longer open source/free software. You are in a realm of dark web and marketplaces.
I do maintain a semi popular open source project that I took over after about a year of inactivity and I seriously considered quiting because of CRA. It's quite easy to cripple/kill something when it basically runs on volunteering of your free time.
There is a massive problem with this on youtube. Pretty much every category on youtube now has a host of these bots trolling content and playing the youtube strike system like a banjo. There are channels detected to showing you how to setup these content mills. This tool can make you good money.
For some reason because we can validates that we should. Any jackass has the power of a research team of phds. It's kinda weird.
Governments successfully collectively controlling dangerous things so they don’t fall into the hands of rogue bad actors fundamentally opposes the extreme individualist every-man-for-himself perspective in every conceivable way. It’s the absolute opposite of “it’s everybody’s responsibility to protect themselves because everybody else is only going to look out for themselves.”
And when individuals have that much leverage, collective action is the only conceivable way to oppose it. Some of those things might be cultural, like mores, some might be laws, some might be more martial. I don’t see how extreme individualism even theoretically could be more powerful.
On a forum that frequently discusses technology with enthusiasm you'd think there'd be more enthusiasm and more constructive criticism instead of blanket write-offs.
As for positive applications, some I see:
* Allowing those with speech impairments to communicate using their natural voice again
* Allowing those uncomfortable with their natural voice, such as transgender people, to communicate closer to how they wish to be perceived
* Translation of a user's voice, maintaining emotion and intonation, for natural cross-language communication on calls
* Professional-quality audio from cheap microphone setups (for video tutorials, indie games, etc.)
* Doing character voices for a D&D session, audiobook, etc.
* Customization of voice assistants, such as to use a native accent/dialect
* Movies, podcasts, audiobooks, news broadcasts, etc. made available in a huge range of languages
* If integrated with something like airpods, babelfish-like automatic isolation and translation of any speech around you
* Privacy from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only
* New forms of interactive media - customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prerecorded lines, etc.
* And of course: memes, satire, and parody
I appreciate HN's general view on technologies like encrypted messaging - not falling into "we need to ban this now because pedophiles could use it" hysteria. But for anything involving machine learning, I'm concerned how often the hacker mentality seems to go out the window and we instead get people advocating for it to be made illegal to host the code, for instance.
Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine. However, the only use-case which would require cloning a specific human voice belonging to a third party, use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.
2, 5, 6, 9: It's true that in theory all you need is some way to capture the characteristics of a desired voice, but voice-cloning methods are the way to do this currently. If you want a voice assistant with a native accent, you fine-tune on the voice of a native speaker - opposed to turning a bunch of dials manually.
7, 8, 10: Here I think there is benefit specifically from sounding like a particular person. The dynamically generated lines of movie characters/videogame NPCs should be consistent with the actor's pre-recorded lines, for instance, and hearing someone in their own voice is more natural for communication and makes conversation easier to follow.
Pedantically, what's promoted here is a tool which features voice cloning prominently but not exclusively - other workflows demonstrated (like generating subtitles) seem mostly unobjectionable.
> Also, notice how the legitimate use-cases 1, 3 and 4 imply the user consenting to clone their own voice, which is fine
I think all, outside of potentially 8 and 11, could be done with full consent of the voice being cloned - an agreement with the movie actor to use their voice for dubbing to other languages, for example. That's already a significant number of use-cases for this tool.
> use-case 11, is "memes, satire, and parody"... and not much imagination is needed to see how steep and buttery that Teflon slippery slope is.
IMO prohibition around satire/parody would be the slippery slope, particularly with the potential for selective enforcement.
Easier and (cheaper?) to just use elevenlabs.
on edit: example of course showing the difficulty as so much of Hepburn was her inflection.
“I’m going to kill you” could be delivered (laughing jokingly / seething with rage / ominously and creepily). I’d like a bot that can mimic the delivery in a different voice.
If not, any recommendations for alternative projects?
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube downloading, vocal isolation(UVR5), Text-to-Speech (Edge-TTS), and multi-language translation. Perfect for content creators and developers.
But if I can get the performance I want and shift it to another voice, then fully voicing free works becomes very accessible (even better would be generative AI which could take a sample of what you want and re-render it into something which sounds like a more professional performance - voice in-fill I suppose).
I wonder if certain familiar voices like that of your parents would lead to higher understanding and retention.
I'm imagining it. It sucks to imagine.
I'm imagining it being used to scam people. I'm imagining it to leech off of performers who have worked very hard to build a recognizable voice (and it is a lot of work to speak like a performer). I'm imagining how this will be used in revenge porn. I'm imagining how this will be used to circumvent access to voice controlled things.
This is bad. You should feel bad.
And I know you are thinking, "Wait, but I worked really hard on this!" Sorry, I appreciate that it might be technically impressive, but you've basically come out with "we've invented a device that mixes bleach and ammonia automatically in your bedroom! It's so efficient at mixing those two, we can fill a space with chlorine gas in under 10 seconds! Imagine a world where every bedroom could become a toxic site with only the push of a button.
That this is posted here, proudly, is quite frankly astoundingly embarrassing for you.
For spear-phishing (impersonate CEO, tell assistant to transfer money) it's more feasible, but I hope it forces acceptance that "somebody sounds like X over the phone" is not and has never been a good verification method - people have been falling for scams like those fake ransom calls[0] for decades.
Not that there aren't potential harms, but I think they're outweighed by positive applications. Those uncomfortable with their natural voice, such as transgender people, can communicate closer to how they wish to be perceived - or someone whose voice has been impaired (whether just a temporary cold or a permanent disorder/illness/accident) can use it from previous recordings. Privacy benefits from being able to communicate online or record videos without revealing your real voice, which I think is why many (myself included) currently resort to text-only. There's huge potential in the translation and vocal isolation aspects aiding communication - feels to me as though we're heading towards creating our own babelfish. There's also a bunch of creative applications - doing character voices for a D&D session or audiobook, memes/satire, and likely new forms of interactive media (customised movies, audio dramas where the listener plays a role, videogame NPCs that react with more than just prereccorded lines, etc.)
[0]: https://www.fbi.gov/news/stories/virtual-kidnapping
Scammers don't have to sound like a specific person to be helped by software like this.
I think there's also an autonomy argument to be made, if the alternative is to the effect of ensuring that people cannot use tools hide their accent (and particularly if, as above, the intent is so they can be discriminated against based on it). Even though it isn't something we've really been able to do before, I think it's generally a person's own right to modify their voice.
Is it not the same with this project?
The primary goal of the voice actor is to achieve a personal connection, and I don't see how AI is a real threat to that end. I feel the same about other mediums as well. This will likely be used for scams, but I doubt it will ever draw as many eyes, or ears, as something a real human can produce. Thus, it won't be a valuable tool to marketers and largely unprofitable.
Well, that's a big old fail. Just a reminder: The given (and proper) home of open source is on an open source OS.