Intel Domain Leader: Shai Fine
We plan to develop opensource library for large scale distributed training of deep networks, which is:
1) optimized for IA (Xeon, XeonPhi),
2) based on opensource data analytics cluster computing framework (Spark, Hadoop).
The research project will:
(1) Employ advanced ML concepts such as distributed learning and improved deep learning architecture.
(2) Include new, advanced & demanding, deep learning based use cases.
The projects
 Optimal Deep Learning and the Information Bottleneck Principle
 SimNets: A Generalization of Convolutional Networks
 Rigorous Algorithms for Distributed Deep Learning
 MegaClass Efficient Deep Learning
 Outlier Robust Distributed Learning + Learning Deep Forward Models for Reinforcement Learning
 Unsupervised and Semisupervised Ensemble Learning
 Distributed Deep Learning on XeonPhi
 Distributed Methods for NonConvex and Deep Learning
 Scene Understanding: from Image to Text and from Image and a Question to an Answer
 Applications of Deep Learning to Medical Imaging

Image Restoration using Deep Learning
Optimal Deep Learning and the Information Bottleneck Principle
Academia Researcher(s): Prof. Naftali Tishby, Hebrew Univerisity
Research Project Summary:
Deep Neural Networks (DNN) and Deep Learning (DL) algorithms are attracting unprecedented attention, as they are currently performing better than most other ML methods on various real world applications, from speech and vision, to NLP and computational biology. Yet, there is little theoretical understanding of DNN. In particular, we are crucially missing theoretically motivated design principles (architecture: number of layers, connectivity, desired features, etc.), useful bounds on information/sample and computational complexities, and provably efficient DL algorithms. Moreover, there is a complete lack of interpretability of DNN models: what do they learn? What do the layers/units represent? What characterizes good problem domains for DNNs? Why convolutional NN work so well and what generalizes this principle? As all engineers know, there is nothing more practical than a good theory, and we are completely lacking one in this important domain. The Information Bottleneck (IB) method was introduced a long ago as an information theoretic principle for extracting the relevant information in one random variable (X) with respect to another variable (Y), given their joint distribution, P(X,Y), or a sample from this distribution. The method has been applied successfully to various supervised and unsupervised ML problems, from text categorization to neuroscience and cognition, but its main appeal is in its principled theoretical foundation. It provides a natural extension to the concept of minimal sufficient statistics and an algorithm for calculating them from empirical data. Optimal neural networks (or other supervised ML methods) should – in principle – act as an informationbottleneck, namely, extract optimal minimal features (minimal sufficient statistics) of the input variables (X) that enable optimal prediction of the output variable (Y). We have recently shown an exact correspondence between DNN and the IB problem, which explains the emergence of neural layers, their number and optimal architecture, as phase transitions in the Information Bottleneck optimal curve. In this work we would like to further exploit this correspondence and suggest new theoretical bounds and better deep learning algorithms.
SimNets: A Generalization of Convolutional Networks
Academia Researcher(s): Prof. Amnon Shashua, Hebrew Univerisity
Participating Student(s): Nadav Cohen
Research Project Summary:
We propose a deep layered architecture that generalizes classical convolutional neural networks (ConvNets). The architecture, called SimNets, is driven by two operators, one being a similarity function whose family contains the innerproduct operator on which ConvNets are based, and the other is a new soft maxminmean operator called MEX that realizes classical operators like ReLU and max pooling, but has additional capabilities that take SimNets far beyond ConvNets. Two interesting properties emerge from the architecture: (i) the fundamental constructions that are analogous to multilayer perceptron, LeNet and Network In Network are all kernel machines of different types, and (ii) networks may be initialized through a natural unsupervised scheme that carries with it the potential for automatically learning architectural parameters. Experiments demonstrate the capability of SimNets for achieving accuracy comparable to ConvNets with networks that are almost an order of magnitude smaller.
The expected Significance of this program is very high. The use of Deep Layered learning in visual understanding and speech recognition and natural language processing is gaining momentum and achieving great success in beating public benchmarks against classical machine learning techniques like kernel machines and boosting. However, the empirical success that ConvNets is mainly fueled by the evergrowing scale of available computing power and training data, with algorithmic advancements having secondary contribution. Secondly, Deep Learning is exclusively supervised meaning that large labeled training sets are required — which is unlike human visual processes that can cope with extracting meaning from unsupervised data to a large extent. Although there were attempts to use unsupervised learning to initialize ConvNets, it has since been observed that these schemes have little to no advantage over carefully selected random initializations that do not use data at all. One of the main advantages of SimNets is initialization using unsupervised learning — this has the potential of using much less supervised data for training the network than traditional ConvNets.
Publications

Rigorous Algorithms for Distributed Deep Learning
Academia Researcher(s): Prof. Shai Shalev Shwartz, Hebrew Univerisity
Research Project Summary:
The project deals with new distributed algorithms for training deep networks. Our approach is based on theoretical analysis of the pitfalls of existing methods, which leads to the design, and analysis of alternative approaches. In particular, we underscore several problems of stateoftheart approaches, such as high variance of update step, low probability to update based on rare events, and the vanishing gradient problem. We propose new practical algorithms that explicitly tackle these problems and show their advantage in the context of distributed training.
Publications

MegaClass Efficient Deep Learning
Academia Researcher(s): Prof. Koby Crammer, Technion
Research Project Summary:
Our goal is to develop new models and algorithms that learn such models from annotated data. We aim for building systems that are able to tag an input with one (called singlelabel) or more (called multilabel) categories from hundreds of thousands possible classes. For example, a document may be about a presidential visit to a soccer match, thus annotated both as about sports and politics. Typical realworld annotation problems may have such large amount of labels, such as topics in Wikipedia, or annotating images in the web. We plan to find a mapping of all possible labels into some joint space.
We expect our research to advance few areas: core machine learning in classification, both single and multilabel; document categorization and image annotation; and learning with missing information (semisupervised learning. Can be done with a good learned map).
A successful research will benefit Intel and the industry, as it will be possible to build systems that will annotate very large amount of data, often collected in many realworld applications. Such hugescale automatic annotation capabilities can be a first step towards a business analytics systems that process bigdata.
Kobi Crammer – Publications

Outlier Robust Distributed Learning + Learning Deep Forward Models for Reinforcement Learning
Academia Researcher(s): Prof. Shie Mannor, Technion
Participating Student(s): Oran Reichman
Research Project Summary:
We consider distributed machine learning in the presence of outliers. Many of the learning algorithms (such as neural networks and support vector machines) used for classification, regression, and structure learning are extremely brittle to data points that are substantially different from the bulk of the data. Having even a few samples that are deliberately made to be “hard” can confuse such learning algorithms. In this research project we plan develop a framework for parallelizing machine learning algorithms in the presence of outliers.
Shie Mannor – Publiactions

Unsupervised and Semisupervised Ensemble Learning
Academia Researcher(s): Prof. Boaz Nadler, Weizmann Institute
Participating Student(s):
Ariel Jaffe
Omer Dror
Research Project Summary:
In a variety of applications, (big) data practitioners and end users have large unlabeled test sets, and little or even no labeled data. With the availability of free machine learning packages they can easily compute the predictions of many different classifiers on their data. How should these end users, typically not knowledgeable in machine learning, decide which classifier is best suited to their data?
The goal of this proposal is to develop novel unsupervised and semisupervised ensemble learning methods, suitable to such scenarios. Our methods would allow, in a principled manner, to estimate the accuracies of the different classifiers, point the practitioner to the most suitable classifier for his/her dataset, as well as construct more accurate ensemble learners. In contrast to classical supervised learning, the main novelty is that we propose to do these tasks with little or even no labeled data. The main idea is to build upon recent work by the PI and collaborators, that proposed a simple spectral approach to tackle such problems. Specifically we plan to generalize our prior work in three important directions: i) add a model of instance difficulty; ii) develop semisupervised ensemble methods; iii) develop methods that detect strongly correlated yet inaccurate classifiers, and consequently construct improved ensemble learners. These extensions should significantly advance the stateoftheart in this emerging field. Moreover, we anticipate our algorithms to be implemented in machine learning packages and used by practitioners worldwide.
Boaz Nadler – Publications

Distributed Methods for NonConvex and Deep Learning
Academia Researcher(s):
Prof. Ohad Shamir, Weizmann Institute
Prof. Natan (Nati) Srebro, TTI
Participating Student(s):
Yossi Arjevani
Itai Safran
Behnam Neyshabur
Jialei Wang
Liat Peterfreudn
Research Project Summary:
The main objective of this project is to understand and develop distributed methods for nonconvex and deep learning problems. This lies at the intersection of two of the most significant trends in current machine learning research, i.e. deep learning systems, which have recently led to breakthrough performance improvements across several difficult AI tasks; and scalable methods which can be distributed and parallelized across many computing cores. This is a highly ambitious goal, since we don’t fully understand how to perform distributed learning even on learning problems much “nicer” than deep learning, and even less understanding on how to do deep learning in a principled manner. We plan to attack this by building on our previous work in distributed convex learning and deep learning, going through a spectrum of intermediate learning settings which may be easier to study (some of them interesting and with applicative potential in themselves), eventually leading to methods and algorithms for distributed and deep learning. Due to the high importance of these topics in both academia and in industry, this project has the potential for significant impact.
Ohad Shamir – Publications

Nati Srebro – Publications

Distributed Deep Learning on XeonPhi
Academia Researcher(s): Prof. Mark Silberstein, Technion
Participating Student(s): Jonathan Ezroni
Research Project Summary:
We plan to develop a generic distributed CNN training framework on Intel XeonPhi accelerators. Our ultimate goal is to enable training very large CNNs on a multinode cluster of XeonPhi accelerators interconnected via Infiniband network. This work has two primary thrusts:
(1) acceleration and parallelization of highly popular Cafe CNN training framework on XeonPhi, and
(2) Developing native distributed system on XeonPhi processors, and optimizing the network and system stack to achieve high performance.
Accelerating Caffe on XeonPhi will allow to extend the boundaries of what is possible today with CNNs. If successful, it will bring significant boost to the CNN training speed and will promote the use of XeonPhi processor in an application domain which currently experiences unprecedented growth and levels of interest. Further, building an allnative distributed system on XeonPhi will help gain insights regarding the development of a complete native application on massively parallel processors, which is a critical step toward successful adoption of future generations of Intel's Knights Landing processor.
Scene Understanding: from Image to Text and from Image and a Question to an Answer
Academia Researcher(s): Prof. Lior Wolf, Tel Aviv University
Participating Student(s):
Ben Klein
Guy Lev
Sivan Keret
Yossi Biton
Michale Rotmann
Research Project Summary:
We solve the scene understanding task: given an image we generate a textual description of it. Since the textual description can be nonspecific, we add the ability to ask the system questions about the image. The system would then produce specific answers. This is a huge leap forward in AI, or at least it was conceived this way, until it became within reach. We plan to keep using deep learning tools such as Concolutional Neural Networks, Recurrent Neural Networks, neural word embedding, as well as computer vision tools such as the Fisher Vector and new variants of it, and statistical tools such as Canonical Correlation Analysis.
Lior Wolf – Publications

Applications of Deep Learning to Medical Imaging
Academia Researcher(s): Prof. Hayit Greenspan, Tel Aviv University
Participating Student(s):
Idit Diamant
Avi BenCohen
Ofer Geva
Research Project Summary:
Project deals with applications of DL to the domain of medical imaging. Several tasks will be explored, including: medical classification tasks, in which combination of nonmedical with medical image finetuning will be explored; and medical detection tasks, in which pathologies will be learned on a voxel by voxel level using novel DL techniques. Medical domains include Chest pathologies in Xray imagery and Liver lesions in CT imagery. Big data combining a large set of imagery and descriptive text is the overall goal.
Hayit Greenspan – Publications

Image Restoration using Deep Learning
Academia Researcher(s):
Prof. Michael Zibulevsky, Technion
Prof Miki Elad, Technion
Research Project Summary:
In our research we plan to apply deep neural network for image restoration in Compressed Sensing and Computed Tomography. As a result we expect to increase image restoration quality given amount of measurements, or alternatively, reduce amount of measurements given requirements to image quality. This, for example, leads to significant reduction of radiation dose in computed tomography, which has significant impact on patient cancer risk.
We also propose two new concepts in neural network architecture: Doublesparse weights and Superneurons which should increase computational and energy efficacy and robustness of a network and can be used in variety of applications above image restoration.
Michael Zibulevsky – Publications
