Part 2: Capturing PDFs, translating Google and Star Trek
Can you talk about the new PDF searches and how that's allowing folks to capture the "Invisible Web"?
Sure, we started crawling PDF files in addition to HTML files and I think we now have the biggest collection of searchable PDF files on the Web. We weren't the first to do this but I think we were the first to make it part of a regular commercial search engine, as opposed to a pilot project.
Our goal is exactly what you said, we want to make sure that all information is available to people, not just information that happens to be on HTML web pages. PDF is the first step towards that. There's a lot of the Web that you could never search over before and the reason we crawl PDF is we want to make it so people can start searching for that kind of information.
Do the PDF files come up untitled?
Right now they do; it is possible to title a PDF document but very few people take advantage of that option.
And are you working to improve the international search technology?
Yes, I'm glad you asked that. That's a very high priority for us. The traffic we have outside of the US is increasing awfully fast. We think that's great and we already have 26 languages that we have our interface translated into so people can use Google in their own language.
We recently launched a volunteer program to have people help us translate into the more obscure languages. We offered a service where we said, "We'll provide the English text for everything we can display on Google and if you know these languages help us translate into it."
So we've found other people who really want Afrikaan for instance, and they're willing to take the effort to translate so that other Afrikaan speakers can benefit from that.
What things keep you up at night as the CTO of Google?
One of the things I worry about is growth and keeping up the hardware. Our system was designed from the ground up to be scalable. But there are large spikes where a large amount of traffic comes in and it's hard to determine when those might happen.
We have grown quite a bit internationally from word of mouth and it's almost like we reach the "tipping point" (see Malcom Gladwell's book, "The Tipping Point") and then we have thousands of new people using Google . It's very challenging to successfully handle the growth.
What's the puzzle you're trying to solve as far as improving the search?
One thing that I think is crucial, that no one is anywhere near solving is natural language understanding. I would like to see the search engines become so good, I mean, like the computers in Star Trek where you talk to them and they understand what you're asking.
And not only do they find the answer for you, but they summarize it for you in a nice pleasant voice. We're nowhere near that now. But I do think it will happen sometime in the future and I think it will be very revolutionary when it does.
Thanks for your great insight and here's to Star Trek and the revolution.
|