Say2Play™ - The Weapon of Voice™ Forum
September 05, 2010, 09:12:42 pm *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: Say2Play™ 2.0 is here!!

Latest released templates:
Dawn of War II  -  World of Warcraft: Wrath of the Lich King [Updated]  -  Company of Heroes: Tales of Valor  -  Command & Conquer: Red Alert 3  -  Runes of Magic  -  Grand Theft Auto IV -  FEAR 2: Project Origin -  Neverwinter Nights 2: Storm of Zehir  -  Demigod  -  BattleForge -  X-Men Origins: Wolverine
 
   Home   Help Search Login Register  
Pages: [1]   Go Down
  Print  
Author Topic: Suggestion: Voice Recognition improvements  (Read 723 times)
Haggis13
Newbie
*
Offline Offline

Posts: 1


View Profile
« on: July 15, 2009, 05:06:40 pm »

Let me start off by saying I absolutely enjoyed trying Say2Play with the 7-Day Trial. Let me follow that up by saying you will most definitely NOT see me buying this product, for the following two reasons:

  • I'm British. Well, actually, I'm Dutch, but I'm a university student of English English, so my pronunciation is comparable to British RP. Opening Bag 4 is a hassle, as I have to pronounce the R at the end of the word in order for the VR software to pick up my command properly. Closing any panel usually only yields the proper result on account of its command being the same as the command for opening that panel, except with "close" in front of it, and both opening and closing a panel requiring the same keyboard input. Apparently, I have severe difficulties pronouncing the particular American vowel unless I really focus on its pronunciation (which does not come natural at all, I might add). Concluding my session with a hideous "largawt" instead of the proper "logout" was nigh blasphemous.
  • The elongated glottal stops required between seperate words are very unnatural and require too much thought and focus on the pronunciation. "Close skills" should never have to be uttered with two /s/ sounds seperated by a glottal stop.

Incidentally, I am also a university student of Linguistics, who might just have a solution to aforementioned problems. It's sort of like two birds, one stone and it's -- as far as I'm aware -- a very managable solution. Hear me out, if you will:

As my commands are only recognised when pronounced in American English, there is obviously some underlying form of phoneme recognition present. For that, there also has to be phoneme segmentation, so I'm assuming that's in there as well. If there is phoneme segmentation, there is very likely also a quick analysis of F0, F1, and F2, in order to determine phoneme boundaries. This analysis is then probably compared with a predefined set of possible phonemes, words, or even full phrases, due to the fact that English spelling is by no means a solid indication of its pronunciation yet I can make new phrases and even words.

With this in mind, it shouldn't be all that hard to adjust the system to an English English phoneme set. But why stop there? At one point, some speaker of Kentish, Manx, Spanish English, Afro-American English, or Texan comes along, and runs into the same problem.

Solution: Voice recording. Let's say that, whenever a user wants to, he can not only test the list of commands (as is currently possible), but his tests also influence the predefined set of phonemes, words, and phrases. As phoneme recognition, phoneme segmentation, and F0 to F2 analyses are present anyway, why do we not use them to create new templates to describe the user's pronunciation (or perhaps even only where they notably diverge from what's already there) and replace the old templates? This will solve both problems described above, as (1) British or American (or any other dialect) makes no difference to recognition anymore due to the fact that the British can simply save their pronunciation, whereas the Americans can keep theirs, too, and (2) seperate words can simply be saved as phrases. Why, after all, would voice recognition software that does not intend to reproduce spoken words into speech have to be able to seperate spoken words? Treat "Bag 5" as a single word, /bAgfaiv/ instead of /bAg ? faiv/, and there shouldn't be any problems with the command being 'misunderstood' by the software, while preserving a more natural phrase (rather than a combination of seperated words).

I'm a definite buyer, by the way, provided I'm not forced into American English pronunciation. It's not that I don't like you; it's just that I like John Cleese better.
« Last Edit: July 15, 2009, 05:08:40 pm by Haggis13 » Logged
Eftwyrd
Newbie
*
Offline Offline

Posts: 1


View Profile
« Reply #1 on: October 17, 2009, 10:33:34 am »

First want to say I love the product, but I am also a native English speaker, but I am English and speak the Queens English, and thus I also have the same issue the original poster is refering to.

Some words work 100%, others I cant get to work at all, no matter how Ämerican I try and pronounce words.

Attack wont work, hawk wont work, viper wont work, could go on but you get the idea.

Though monkey, cheetah both work 100% everytime, it is incredibly frustrating.

I actualy found a much older programme called game commander 2, this one allows you to record your voice and this seems to work 100% for me, unfortunately this software seems to be abandoned and I am unable to find somewhere to purchase it, and the demo version is not only time restricted, but also only allows the use of 5 or 6 commands making it useless for my purposes.


Not sure what the answer is, but the speach recognition built into windows recognises my voice fine, so somewhere out there must be different voice sets for different regions.


Hopefully there is a fix for this or something as my wife and me would both love to buy this, but we cant put on an American accent all the time to use it.
Logged
disembowler
Newbie
*
Offline Offline

Posts: 1


View Profile
« Reply #2 on: June 15, 2010, 01:07:33 pm »

I have to agree with my fellow posters. I to am from the UK and whilst I truely love elements of the program, I find the recognition element somewhat frustrating. I play World of Warcraft and those that are familiar with 'WoW' will know that there are literaly thousands of potential key combinations. Having to trial and error keywords in the hope that the pronunciation is close enough to the predefined American English equivelants is a very frustrating.

To give one example, There are approximately 120 predefined emotes available in the WoW template. I cycled through the list of emotes 4 times and the maximum number of emotes that correctly triggered on each cycle when speaking 'naturaly' was 11.  I then got a colleague from work who is from Anaheim CA to try it out and his success rate was was about 85% triggering repeatedly.

I note that both previous post are a number of months old. Do you have any intention to try and tackle this one flaw ? (From the perspective of us non American English speaking folks)

Keep up the good work
Logged
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC Valid XHTML 1.0! Valid CSS!