Inferret is a company that is pushing the boundaries of speech recognition and natural language processing. Inferret is currently gearing up to offer speech recognition as a software service to application developers, with a core focus on iPhone and web application developers. While not a VoIP company per se, the areas they are working in border the telephony world in many fascinating ways and offer a plethora of new opportunities for the future. Denphone's Simon Gibson caught up recently with Inferret co-founder and CEO Ed Whittaker for a brief chat about Inferret and the world of speech recognition.
Simon: First of all could you tell us a little about Inferret?
Inferret was incorporated in August of 2007, as a spinout from Tokyo Institute of Technology focusing on natural language processing and speech recognition. At that time I was still a Visiting Researcher at the Tokyo Institute of Technology but it wasn't until April of this year that I left the University to focus full time on Inferret and our new speech recognition service offering for developers. All our applications reside in the cloud, so the users of our products do not have to know too much about how our products work either to use them successfully or to add the extra functionality to their existing applications.
Simon: Can you tell me more about the current product you are working on?
At the same time as developing our cloud-based speech recognition platform for developers, we are also developing our own speech recognition applications for the iPhone, mainly as a showcase of our core technologies. Our first application allows users to easily search Japanese train timetables covering the whole of Japan simply by saying the name of the departure and arrival stations in natural language, so just like asking a human. The software searches for the two most likely matches and returns the timetable information for the requested two stations. This app was released just over a month ago (September 2009) and was the first iPhone app to allow voice searches of Japanese train timetables in this manner. We currently have several thousand active users with many tens of thousands of accesses since then. The overall response from our users has been very positive. Currently we are just offering the Japan train timetable but hope to extend this to other regions and languages in the near future. In fact, our new speech recognition platform will take everything one step further and allow outside developers to do all this themselves.
(Simon: I tested this app and it works well. I said Hiroshima and then Osaka, and it returned the next train between Hiroshima and Osaka. When I tried to say it with a very strong New Zealand accent (Lynn of Tawa, anyone?) it wasn't able to recognise the station names but then I imagine only other Kiwis and perhaps some Australians would have been able to understand what I was saying.)
Simon: What is the status of speech recognition in Japan?
I think compared to the United States and the UK, where speech recognition is already quite mainstream, Japan still has some catching up to do. That said, I believe speech recognition apps will really pick up in Japan in 2010. This is especially true as they are starting to be much more usable and reliable and people here are starting to understand the convenience they offer.
Simon: What was your background? and how did that drive you to where you are today?
My research background has focused on natural language processing and speech recognition and the search capabilities that the combinations of these technologies allow. When we started out we thought we could be the next Google - which in retrospect wasn't a terribly good business plan. We dabbled in many areas around the fields of search, speech recognition and natural language processing, and even social networking. One thing that turned out to be very important for us was the process of talking to customers, getting feedback from them and finding out what they really wanted and were interested in. This has now given us a pretty good idea of what customers want, and what kind of systems we could develop and we now feel we're very much on the right path.
Simon: What other applications have you developed?
Another app we have put together is an improvement on the traditional text-based train timetable search application, although really it is applicable to any text-based vertical search engine. The current app takes 2 station names as input which are then processed using our proprietary algorithms and databases to find the 2 most likely matching stations. This is especially useful for foreigners in Japan - so if you don't know the exact spelling of a station's name you can give it a rough guess and it will still get the correct answer.
A good example of this is a station like the one closest to Denphone - Azabujuban. Easy to write in English, but if you have to break it down into Japanese is it Azabu or Azabuu, Juban or Juuban, one word or two words and is there a hyphen between them? So our system works even if there are several mistakes.
(Simon: I tried this with the following:
azabujvsb -> odska
and the third result was the one I was after - Azabujuban to Osaka. I then tried:
abujukbah -> jotugakga
which quite amazingly returned the stations I was looking for Azabujuban to Jiyugaoka.)
The functionality is similar to the spell check in Microsoft Word or Open Office but also exploits knowledge about the domain and task (in this case station names in Japan). We can also make platform specific adjustments. For example, if you are using the iPhone keyboard there are some common mistakes which our application can learn to correct.
Both speech recognition and natural language processing have to deal with noisy input, albeit different kinds of noise. The pattern recognition approach, which all our applications rely on, uses previously observed patterns of characters and users' mistakes for training. It essentially learns and becomes very robust to such noisy inputs. Clients currently have a great need for this kind of text spell-check and pattern matching functionality, but the way of the future is definitely moving towards complementing such applications with speech recognition as well.
Of course, clients also want to know how such technologies will translate into savings or increased revenue, and the answer is quite simple - providing better user experiences leads to greater levels of user retention and word of mouth referrals. One only has to think about Google - they became the number one search engine by offering the best user experience through reliable search results. Ok, they had a great business model as well which certainly helped. But these kinds of advanced search options allow companies to increase user satisfaction while also being a factor in brand differentiation.
Simon: What have your customers reactions been like?
They have been very positive. We show a demo to a customer and then we see a glimmer in their eyes as they realise how they can apply our services and technologies to their own unique problems. It quickly becomes clear to the customer how they can differentiate their services and leverage our technology to solve the particular problems they are facing.
Simon: I would have thought there would be more resistance to a new service such as yours?
I think the big difference compared to 5 years ago, is that these services actually work really well. We have a lot more data today, much more powerful computers and mobile phones as well as the speeds that 3G internet offers. In the case of speech recognition we also have much better microphones on the handset side. All these advances are exemplified by the iPhone which allows our systems to be multi-modal, complementing typing and pointing with speech-to-text in a much more intuitive and reliable manner. On a device such as the iPhone, users often find it much faster to input data by speech rather than by typing. So I would say all of these factors have actually helped reduce people's resistance to our services.
Simon: What are the potential telephony applications for your services?
As I mentioned we are starting to offer speech recognition as a service. This means giving developers all the tools they need, including client libraries, to quickly and easily add speech recognition functionality to their existing applications. If, for example, you are developing telephony applications such as interactive voice response (IVR) systems then there is certainly a wide range of ways that our services can be applied. We are currently working on a solution for Asterisk to interface directly with our cloud-based speech recognition platform. I hope this is something that Denphone will also take an interest in. Most of the hard computational work is done on the server side - we want to provide as many means as possible to access the server from different platforms and using different programming languages. The idea is to make it really easy for any developer to get up and running right away. The service will work similarly to many existing web services, with a limited number of daily requests provided for development purposes, moving onto a paid service once the projects go live.
Simon: Just before we wrap up, any words of advice for someone looking to start up a company up in Japan?
That depends which business you are thinking of, but for a tech startup my main advice would be to make sure you get enough investment to take you through 2 to 3 years of little to no revenue. Make sure you know as much as possible about your investors before you start - and the fewer of them the better by the way! You should obviously have a business plan but use the feedback from your prospective customers to keep refining your plan. Unless you are lucky you will need about a 2 year window to really get things going. And even then of course nothing is guaranteed - you may get a contract, do all the work, and the client may not end up being able to pay. You need to prepare for all such eventualities by keeping your pipeline of projects as varied and full as possible.
Simon: Thank you for taking the time to talk to us today.
Ed Whittaker has a Masters of Engineering in Electronic Engineering from the University of Nottingham and a Ph.D in Speech Recognition from Cambridge University Engineering Department. Since graduating from Cambridge, he has worked for Compaq, Cambridge Research Labs (Mass.) and Philips Research Laboratory in Germany. He joined the Tokyo Institute of Technology in 2003. In his spare time he enjoys keeping up with technology, and gadgets.
Inferret is looking for potential investors as well as third-party developers interested in integrating Inferret's speech recognition platform into their own applications. For inquiries please contact Inferret - info @ inferret.jp






