### Archive

Posts Tagged ‘IT Education’

## Distributed Learning: A new model

The Geomblog

Communication is now the key to modelling distributed/multicore computations. Jim Demmel has been writing papers and giving talks on this theme for a while now, and as processors get faster, and the cloud becomes a standard computing platform, communication between nodes is turning out to be the major bottleneck.

So suppose you want to learn in this setting ? Suppose you have data sitting on different nodes (you have a data center, or a heterogeneous sensor network, and so on) and you’d like to learn something on the union of the data sets. You can’t afford to ship everything to a single server for processing: the data might be too large to store, and the time to ship might be prohibitive.

So can you learn over the (implicit) union of all the data, with as little discussion among nodes as possible ? This was the topic of my Shonan talk, as well as two papers that I’ve been working on with my student Avishek Saha, in collaboration with  Jeff Phillips and Hal Daume. The first one will be presented at AISTATSthis week, and the second was just posted to the arxiv.

We started out with the simplest of learning problems: classification. Supppose you have data sitting on two nodes (A and B), and you wish to learn a hypothesis over the union of A and B. What you’d like is a way for the nodes to communicate as little as possible with each other while still generating a hypothesis close to the optimal solution.

It’s not hard to see that you could compute an $\epsilon$-sample on A, and ship it over to B. By the usual properties of an $\epsilon$-sample, you guarantee that any classifier on B’s data combined with the sample will also classify A correctly to within some $\epsilon$-error. It’s also not too hard to show a lower bound that matches this upper bound. The amount of communication is nearly linear in $1/\epsilon$.

But can you do better ? In fact yes, if you let the nodes talk to each other, rather than only allowing one-way communication. One way of gaining intuition for this is that $A$ can generate classifiers, and send them over to $B$, and $B$ can tell $A$ to turn the classifier left or right. Effectively, $B$ acts as an oracle for binary search. The hard part is showing that this is actually a decimation (in that a constant fraction of points are eliminated from consideration as support points in each step), and once we do that, we can show an exponential improvement over one-way communication. There’s a trivial way to extend this to more than 2 players, with a $k^2$ blow up in communication for $k$ players.

This binary search intuition only works for points in 2D, because then the search space of classifiers is on the circle, which lends itself naturally to a binary search. In higher dimensions, we have to use what is essentially a generalization of binary search – the multiplicative weight update method. I’ll have more to say about this in a later post, but you can think of the MWU as a “confused zombie” binary search, in that you only sort of know “which way to go” when doing the search, and even then points that you dismissed earlier might rise from the dead.

It takes a little more work to bring the overhead for k-players down to a factor k. This comes by selecting one node as a coordinator, and implementing one of the distributed continuous sampling techniques to pass data to the coordinator.

You can read the paper for more details on the method. One thing to note is that the MWU can be “imported” from other methods that use it, which means that we get distributed algorithms for many optimization problems for free. This is great because a number of ML problems essentially reduce to some kind of optimization.

A second design template is multipass streaming: it’s fairly easy to see that any multipass sublinear streaming algorithm can be placed in the k-player distributed setting, and so if you want a distributed algorithm, design a multipass streaming algorithm first.

One weakness of our algorithms was that we didn’t work in the “agnostic” case, where the optimal solution itself might not be a perfect classifier (or where the data isn’t separable, to view it differently). This can be fixed: in an arxiv upload made simultaneously with ours, Blum, Balcan, Fine and Mansour solve this problem very neatly, in addition to proving a number of PAC-learning results in this model.

It’s nice to see different groups exploring this view of distributed learning. It shows that the model itself has legs. There are a number of problems that remain to be explored, and I’m hoping we can crack some of them. In all of this, the key is to get from a ‘near linear in error’ bound to a ‘logarithmic in error’ bound via replacing sampling by active sampling (or binary search).

## Online Education Venture Lures Cash Infusion and Deals With 5 Top Universities

ACM TechNews

SAN FRANCISCO — An interactive online learning system created by two Stanford computer scientists plans to announce Wednesday that it has secured $16 million in venture capital and partnerships with five major universities. Enlarge This Image Coursera Andrew Ng and Daphne Koller, the Stanford computer scientists who created Coursera. The scientists, Andrew Ng and Daphne Koller, taught free Web-based courses through Stanford last year that reached more than 100,000 students. Now they have formed a company, Coursera, as a Web portal to distribute a broad array of interactive courses in the humanities, social sciences, physical sciences and engineering. Besides Stanford and the University of California, Berkeley, where the venture has already been offering courses, the university partners include the University of Michigan, the University of Pennsylvania and Princeton. Although computer-assisted learning was pioneered at Stanford during the 1960s, and for-profit online schools like the University of Phoenix have been around for several decades, a new wave of interest in online education is taking shape. “When we offer a professor the opportunity to reach 100,000 students, they find it remarkably appealing,” Dr. Koller said. Last fall a course in artificial intelligence taught by Sebastian Thrun, then at Stanford, and Google’s director of research, Peter Norvig, attracted more than 160,000 students from 190 countries. The free course touched off an intense debate behind the scenes at Stanford, where annual tuition is$40,050. Ultimately, the 22,000 students who finished the course received “certificates of completion” rather than Stanford credit. And Dr. Thrun, who also directs Google’s X research lab, left his tenured position at Stanford and founded a private online school, Udacity.

Coursera (pronounced COR-sayr-uh), based in Mountain View, Calif., intends to announce that it has received financial backing from two of Silicon Valley’s premier venture capital firms, Kleiner Perkins Caufield & Byers and New Enterprise Associates. The founders said they were not ready to announce a strategy for profitability, but noted that the investment gave them time to develop new ways to generate revenue.

One of their main backers, the venture capitalist John Doerr, a Kleiner investment partner, said via e-mail that he saw a clear business model: “Yes. Even with free courses. From a community of millions of learners some should ‘opt in’ for valuable, premium services. Those revenues should fund investment in tools, technology and royalties to faculty and universities.”

Both founders said they were motivated by the potential of Internet technologies to reach hundreds of thousands of students rather than hundreds.

“We decided the best way to change education was to use the technology we have developed during the past three years,” said Dr. Ng, who is an expert in machine learning. Previously he said he had been involved with Stanford’s effort to put academic lectures online for viewing. But he noted that there was evidence that the newer interactive systems provided much more effective learning experiences.

He and Dr. Koller dismissed the idea that companies would “disintermediate” universities by spotting the brightest talents among students and hiring them directly.

Coursera and Udacity are not alone in the rush to offer mostly free online educational alternatives. Start-up companies like Minerva and Udemy, and, separately, the Massachusetts Institute of Technology, have recently announced similar platforms.

In December, M.I.T. said it was forming MITx under the leadership of L. Rafael Reif, the university’s provost, and the computer scientist Anant Agarwal. The program began offering its first course, on circuits and electronics, in March. As at Stanford, students receive a certificate of completion but not university credit.

Unlike previous video lectures, which offered a “static” learning model, the Coursera system breaks lectures into segments as short as 10 minutes and offers quick online quizzes as part of each segment.

Where essays are required, especially in the humanities and social sciences, the system relies on the students themselves to grade their fellow students’ work, in effect turning them into teaching assistants. Dr. Koller said that this would actually improve the learning experience.

The Coursera system also offers an online feature that allows students to get support from a global student community. Dr. Ng said an early test of the system found that questions were typically answered within 22 minutes.

He acknowledged that there was still no technological fix for cheating, and said the courses relied on an honor system.

Dr. Koller said the educational approach was similar to that of the “flipped classroom,” pioneered by the Khan Academy, a creation of the educator Salman Khan. Students watch lectures at home and then work on problem-solving or “homework” in the classroom, either one-on-one with the teacher or in small groups.

Dr. Ng said he had already vastly extended his reach by using the Internet as a teaching platform. He cited one student who had been in danger of losing his job at a large telecommunications firm; after he took the online course, he improved so much he was given responsibility for a significant development project. And a programmer at the Fukushima nuclear power plant in Japan was able to immediately apply machine-learning algorithms to the crisis that followed the earthquake and tsunami last year.

A version of this article appeared in print on April 18, 2012, on page B4 of the New York edition with the headline: Online Education Venture Lures Cash Infusion and Deals With 5 Top Universities.