Projects under the chair professors of CCBR IIT MADRAS:

  • Prof H N Mahabala distinguished chair in computational brain research (Prof Partha Mitra)
  • N.R. Narayanamurthy Distinguished Chair in Computational Brain Research (Professor Mriganka Sur)
  • Prof CR Muthukrishnan Distinguished Chair in Computational Brain Research (Prof. Anand Raghunathan)

Prof H N Mahabala distinguished chair in computational brain research

Personnel and collaborators:

Prof Partha Mitra
Prof Sukhendu Das
Prof Balaraman Ravindran
A/Prof Mohansankar Sivaprakasam
A/Prof Sutanu Chakraborti
Dr Jaikishan Jayakumar
Mr Keerthi Ram
Mr Shivaramakrishnan Kumar
Ms Ashika Naidu
Mr K.R. Vijay Babu
Mr. Balaji
Mr Giriraj Pahariya
Ms Madhumita Harish
Mr Samik Bannerjee
Mr Manav Choudary

Prof Partha Mitra’s research involves the study of complex neurobiological systems using a combination of experimental and computational approaches. The main focus of his lab is to generate a mesoscale connectivity map of the brain. The lab is currently studying the organization of the brain in different species such as mouse (The mouse brain architecture project at CSHL; marmoset, zebra finch, macaque monkey and human. These methods are being developed in IIT Madras in collaboration with Prof Mitra laboratories in CSHL and RIKEN. In addition, we are also developing a neuroinformatics pipeline which can be used for the analysis of large volumes of neurobiological data.

The specific projects that are currently underway under the HN Mahabala chair professorship in computational neuroscience are detailed below

1)      Image processing for aligning the coronal slices anatomically as well as to a reference atlas

Figure 1 shows a brain section injected with two viral tracers in the different layers of the motor cortex. The annotation contours represent the contours derived from aligning the section to a reference atlas

We have developed image processing techniques and architecture that can align neurohistochemically prepared brain slices automatically so that it can be viewed in series or individually. This technology can be expanded to include multimodal alignment from brain data that includes MRI, staining for Nissl substance and other such modalities. In addition, we have developed the capability to align these brain sections to a reference atlas such as the Allen Institute reference brain atlas. This work is being done and developed in collaboration with Prof Mike Miller from John Hopkins University, USA.

2) Image processing for identifying tracer injection location, connectivity paths and linked cell structures

Figure 2 shows part of a brain section which has been digitally filtered. This part of the brain (primary motor cortex) has been injected with two viral tracers.  The location of neurons (circular objects) indicates the presence of the injection location.

We have developed novel image processing techniques in collaboration with cold Spring Harbor laboratory to identify the locations of viral tracer injections that we have injected in mouse brains to identify the connectivity paths of that particular area. These image processing techniques combined with the atlas mapping gives us the capability of analyzing large datasets for injection locations and thus helps in accurately quantifying the data within the mouse brain. The figure shows one such example where the location of the injection can be identified as the center of neurons (circular objects). The same technique can also be used to classify projections (the lines emanating from the injections) thus enabling us to quantify our data and infer the connectivity of that particular area.

3)  A High Throughput Laboratory Information Management System for analyzing

mesoscale resolution images of the brain

As part of the Mouse Brain Architecture (MBA) Project at Cold Spring Harbor Laboratory, Prof Mitra’s lab has systematically prepared serial cryosections of mouse brains and have imaged the sections at gigapixel resolution. The aim is to capture the anatomical trajectories of long-range neuronal projections and structural anatomy for each individual brain. The images collected in the project need to be processed and analyzed for extracting information about the mesoscale neuronal connectivity. The development of such a resource is being done at IIT Madras by a combination of efficient network transfer, grid-accelerated computing, web based visualization and interactive annotation

 (3) Automated segmentation of Nissl-stained somata from whole-brain histological

image data

Modern neuroanatomical research relies on whole brain imaging using light microscopic techniques. An essential step of a neuroanatomical study is “mapping”, i.e. identifying

the brain compartment in which a labelled cell or process is located. Classically, this mapping was done by visual examination of one or more histochemical stains, of which the Nissl stain is the most widely utilized variety, dating from the times of Brodmann (1908). It is important to automate and use machine learning techniques to perform this step due to the large data volumes (1012 pixels from a single mouse brain).

We are in the process of developing an automated algorithm that can identify and segment t Nissl-stained sections into component objects (somata of neurons, glial cells and other microscopic objects in the image), which can be further grouped to obtain information about the brain region involved. As a first step, we have applied a region growing algorithm (watershed transformation) for the pixel classification. We are currently exploring the possibility of using machine learning architecture such as SEGNET for this process.

To evaluate the performance of the algorithms that we are developing, we have developed an annotation interface where any user or volunteer can annotate individual Nissl stained brain sections.

(4) Automated Detection of Green Fluorescent probe stained nuclei in light microscopy

Green fluorescent probes (GFP) are used widely to study the structural and functional anatomy of the brain. Whole-brain data sets at a light microscope resolution are now available for study but the size of these data sets (~1TB) means it is necessary to develop automated methods for detection and counting of objects of interest within the image. We have developed a two-step iterative algorithm that allows for the automated detection of GFP labelled interneurons in the brain. This algorithm is currently being applied to a large dataset in the mouse brain architecture portal and is being used to analyze the distribution of the GABAergic neuronal types in a wild type (+/+) and Autism spectrum model (16pdf/+) mice.

5) ConnExt: Tool for Brain Region Connectivity Extraction from Neuroscience research articles

Research on brain region connectivity is of immense interest to neuroscientists. Experiments done in the wet lab are often reported in scientific literature by researchers from all over the world. In this work, we have attempted to compile the evidences of connectivity between identified brain regions in the literature and engineer a tool that can potentially be used by neuroscientists to validate their experiments with what the community has reported. The tool serves as a search engine for finding brain regions and their connections from a large repository of around 55,000 full-text neuroscience articles. The tool broadly uses Natural Language Processing and Machine Learning techniques.

Several connectivity extractor algorithms have been designed and packaged as part of this tool. The algorithms range from simple regular expression pattern match, to checking presence of connectivity words, and techniques using supervised learning with different feature representations.

The supervised algorithms are based on methods by Ashika et al [1] which use surface level and link parse based syntactic feature representation, trained on the White Text corpus [2], which is a gold standard benchmark dataset, consisting of 1377 abstracts from the Journal of Comparative Neurology. A brief description of the algorithms used in the tool is given below.

• RegExp patterns: Set of popular connectivity patterns compiled by Gokdeniz et al [3] are applied to the corpus, with the hypothesis that a sentence matching any of the regular expressions contains a connected brain region pair.

• Conn Word presence: Presence of connectivity indicators such as the words Afferent, Connect, Innervate, Originate, Pathway, Project, Receive and Input denote connectivity between the brain region mentions in the sentence.

• BoW clustering: A supervised learning method using Bag-of-Word feature representation.

• Conn Word clustering: A supervised learning method using binary valued feature representation based on popular connectivity words.

• LPBridge clustering: A supervised learning method using features derived from link parse based bridges. A bridge is the shortest path between two brain regions within the link parse structure of the sentence.

Experiments on the White Text corpus revealed that the RegExp patterns were able to recall 50% of the sentences with 70% precision, whereas the BoW clustering achieved a recall of 91%, but with a limited precision of 29%. The Conn Word clustering algorithm had a precision of 54% and recall of 58%, where as the LPBridge clustering achieved the best balance between precision and recall at 41% and 85% respectively.

The tool provides brain region based search facility on the 55,000 article repository to find sentences matching the queried region and also fetches connected brain region names as stated in the articles. One gets to apply different algorithms to search the repository and in case of the clustering algorithms, the confidence of the retrieval is also mentioned with every search result. This facility is expected to be useful for researchers in the neuroscience community in that it can enable automatic compilation of brain connectivity results and present them in a searchable form for quick validation of wet lab experiments. Currently, the scope of ConnExt tool is restricted to handling sentences that have at most two brain regions. In future, we plan to extend the tool to support multiple brain regions.

N.R. Narayanamurthy Distinguished Chair in Computational Brain Research (Professor Mriganka Sur)

Machine learning provides powerful approaches for analysing neural signals and brain computations. Concepts from neuroscience in turn stand to powerfully impact machine learning. The goal of our CCBR collaborations is to: (a) understand brain processing by using machine learning approaches to analyze large-scale neuronal data sets obtained from the cerebral cortex, (b) construct models of single neurons and neuronal networks that may provide the foundation for next-generation machine architectures, and (c) generate new data sets based on computational insights and experimental advances. Ongoing projects are described below.

1. A High Resolution-based Signal Processing Algorithm for Spike Estimation from Imaging Data

Jilt Sebastian, Mari Ganesh Kumar, Mriganka Sur and Hema A Murthy

Spike time estimation from calcium (Ca2+) fluorescence signals recorded with large-scale imaging methods such two-photon microscopy is a fundamental and challenging problem in neuroscience. Several models and algorithms have been proposed for this task over the past decade; nevertheless, it is still hard to achieve accurate spike timing estimates from Ca2+ fluorescence signals. While most existing methods rely on the physiology of neurons for modeling the spiking process, this work exploits the nature of the responses of indicators to spikes using signal processing. Ca2+ indicators respond to a spike with a sudden rise that is followed by an exponential decay. Exploiting this property and the high resolution property of minimum phase group delay (GD) functions, a technique for estimating the location of spikes, GDspike, is proposed. The Ca2+ signal is interpreted as the response of an impulse train to the Ca2+ trace (an exponentially decaying function, where the decay rate varies based on the indicator). The performance of the proposed algorithm is evaluated on nine datasets which include various indicators, sampling rates and mouse brain regions. The GDspike approach is compared with

the state-of-the-art signal processing algorithm MLspike, widely used Vogelstein deconvolution algorithm (fast oopsi), and Spike Triggered Mixture (STM) Model which is a state-of-the-art data

driven approach for spike estimation. The algorithm is evaluated using three different metrics: the F-measure, the area under ROC (AUC), and the correlation. The performance of GDSpike is superior to that of the standard Vogelstein deconvolution algorithm and is comparable to that of the MLSpike algorithm. GDSpike, being non-model based, can be used to post-process the output of MLSpike, which further enhances performance.

2. Mouse Visual Cortex Segmentation Using Statistical Models

R. Aadhirai, Hardik Suthar, Mari Ganesh Kumar, Jilt Sebastian, Ming Hu, Hema A Murthy and Mriganka Sur

The visual cortex has a prominent role in the processing of visual information by the brain. Functional characterization of this area aids further studies on cortical computations. The existing state-of-the-art technique identifies areas in mouse visual cortex using retinotopic maps, which are responses mapped to stimulus location. Instead of using retinotopic maps, we have explored statistical and machine learning methods for classifying cortical areas from their response to different kinds of stimuli. Since each area has its own characteristic responses to a particular stimulus, this response could be learned using machine learning algorithms. Building such a model to identify the cortical areas will also be helpful in identifying the properties of these areas in processing visual information.

3. Modeling Neuronal Responses Using Machine Learning

Sidharth Bafna, Rajeev Vijay Rikhye, Mari Ganesh Kumar M, Hema A. Murthy and Mriganka Sur

The primary visual cortex (V1) located on the occipital lobe is one of the best studied cortical areas.  The objective of this work is to analyze and model V1 neuronal responses using machine learning. Previously, deep networks have been used to capture the visual processing capabilities of V1 using real-world data. The neuronal signatures in V1 are well studied for simple stimuli such as gratings but not for complex stimuli such as natural movies. In this work, we analyzed V1 recordings in response to natural movies, using data from several mice. A Convolution Neural Network (CNN) was trained to predict the class of input stimulus (natural movies) using spike train information. We also used different models to verify the uniqueness of responses for different classes of input stimuli. To model the compression from stimulus to responses, we simulated the stimulus-response mappings using a multi-layer perceptron. Finally, the trained models were evaluated on unseen datasets, where the uniqueness of simulated responses were examined using different models. The trained models performed well for simple data sets but did not scale for complex data sets.

4. Receptive Field Identification of Visual Cortex Neurons

Poonam Thapar, Mari Ganesh Kumar M, Jilt Sebastian, Jacque Ip, Sami El-Boustani, Hema A. Murthy and Mriganka Sur

Effective modeling of information mapping from the retina to neurons in the visual cortex and their receptive field identification are crucial steps for understanding higher-level visual processing. This work aimed to solve receptive field identification using real data obtained from the visual cortex of mice. The receptive fields of neurons corresponding to the input stimulus were identified using a novel algorithm. A sparse noise visual stimulus consisting of box positions in a rectangular array of 20 positions was fed as input, and cortical responses to this stimulus were recorded simultaneously. Neuronal responses for every position of the stimulus (box) on the screen were averaged across the trials. The soma receptive field location of a single neuron was similar across the frames, but needed to be aligned owing to experimental artefacts such as movements. Normalized cross-correlation across trials was employed to perform the template matching. Once the frames were aligned by this method, a region of interest (ROI) was selected. The average response of the single neuron was calculated across all the frames in the selected region of interest, and used to determine the receptive field as the location where similar responses were obtained across different trials.

5. Automated Identification of Marmoset Calls

Sakshi Verma, K. L. Prateek, Karthik Pandia, Nauman Dawalatabad, Rogier Landman, Jitendra Sharma, Mriganka Sur and Hema A. Murthy

Marmosets are highly vocal primates. Hence the calls made by them are extensively analyzed for understanding their behavior. The first step for such studies is to identify and annotate the calls in a marmoset conversation audio. A time-consuming way to annotate the calls is to use human annotation. This process can be automated by using machines to annotate the calls, similar to the task of annotating human speech. The automatic annotation method can be either supervised or unsupervised. Preliminary work has attempted to annotate marmoset calls using both the methods above. The first step in automatic identification of calls is to segment the calls. The segmentation can also be supervised or unsupervised. We used a signal processing based segmentation technique to obtain the call segments. These segments were used for both supervised and unsupervised call identification methods. The primary aim of the experiment was to evaluate the call detection performance. Hence, the segments obtained from the unsupervised segmentation technique that matched the ground truth were chosen for evaluation. The (semi) supervised method was a two-stage method. In the first stage, a few templates for all the calls were hand-picked. Dynamic time warping (DTW) was used to identify the calls in the segmented audio. To improve the detection results from the first stage, the calls were reidentified using hidden Markov models (HMMs), which were trained using the templates that gave the best DTW scores. On example conversation files, call detection accuracies of 77.6% and 81.4% were observed in the first and second stages respectively. In unsupervised call identification, the segments were clustered automatically. Initially, all the calls were identified as unique calls. An HMM-based bottom-up clustering technique was applied to merge the calls, iteratively, to obtain a final set of unique calls. An average cluster purity of 78.7% was observed on two pairs of conversation data.

6. Electroencephalography (EEG) Lab

Hema A. Murthy

A 128 channel EEG system (EGI) has been set up in Professor Hema Murthy’s lab.  Most work on EEG signal analysis have primarily dealt with anomalies, namely seizures, autism, analysis of P300 waves of alcoholics vs nonalcoholics, and so on. Relatively little effort has been expended on studying the EEG of normal subjects. Organisation of speech and music into phrases is primarily a cognitive function. One objective of this work is to collect EEG and speech (music) data in parallel, and study the relationship between phraseology in speech/music data and EEG. This will help us understand attention and perhaps help design cognitive signal processing- inspired machine learning models. In the longer term, the following research questions will be addressed: Can we develop quantitative measures for prosody? Can we determine the correlation between acoustics and EEG that enable understanding of prosody?  Can we develop new models for Automatic Speech Recognition (ASR)/Text to Speech Synthesis (TTS)/Music Information Retrieval (MIR)? Can we build simple brain-computer interfaces (BCI) for the speech challenged/motor disabled to control activity without assistance.

7. The Influence of Astrocytes on the Width of the Orientation Hypercolumn: A computational perspective

Ryan T Philips, Mriganka Sur and V. Srinivasa Chakravarthy

Orientation preference maps (OPMs) are present in the cat, ferret and primates but are absent in rodents. In this study we investigate the possible link between astrocytic  arbors and presence of OPMs. We simulate the development of orientation maps with varying hypercolumn widths using a variant of the Laterally Interconnected Synergetically Self-Organizing Map (LISSOM) model, the Gain Control Adaptive Lateral (GCAL) model, with an additional layer simulating the astrocytic activation. The synaptic activity of V1 neurons is given as input to the astrocyte layer. The activity of this astrocyte layer is now used to modulate the bidirectional plasticity of the lateral excitatory connections in the V1 layer. By simply varying the radius of the astrocytes, the extent of lateral excitatory neuronal connections can be manipulated. An increase in the radius of lateral excitatory connections subsequently increases the size of a single hypercolumn in the OPM. When these lateral excitatory connections become small enough the OPM disappears and a salt-and-pepper organization emerges.

8. Fiber laser-based two- and three-photon systems for deep-tissue wide-field live brain imaging

Anil Prabhakar and Mriganka Sur

Large-scale acquisition of data from neuronal populations, in animals performing behavioral tasks, is fundamental for understanding information processing and computations in the brain underlying cognitive functions.  Indeed, such large-scale data needs to be acquired from multiple brain regions simultaneously and across multiple depths spanning the cerebral cortex and even including subcortical structures. Multiphoton imaging of calcium responses of neurons is a major technology that enables large-scale, wide-field, deep-tissue imaging across the brain in behaving animals, particularly mice. Multiphoton image acquisition through significant depths of brain tissue is hindered by strong scattering and absorption due to various tissue components. We aim to develop two-photon and three-photon imaging systems using fiber lasers in a range of wavelengths. The Prabhakar group at IITM is expert in engineering picosecond fiber lasers. The Sur group at MIT has developed three photon imaging systems using femtosecond lasers as the excitation source, and are developing wide-field two-photon imaging systems.  As a part of the collaboration with IIT Madras, we aim to develop picosecond fibre lasers at 1000-2000 nm, which can be directly used for 2 and 3 photon excitation of different fluorophores.

Prof CR Muthukrishnan Distinguished Chair in Computational Brain Research (Prof. Anand Raghunathan)

Personnel and collaborators:

Prof Anand Raghunathan (IITM and Purdue)
Prof Balaraman Ravindran (IITM)
Prof V. Kamakoti (IITM)
Prof. Nitin Chandrachoodan (IITM)
Dr. Neel Gala (IITM)
Dr. Swagath Venkataramani (Purdue and IBM)
Mr. Vinod Ganesan (IITM)
Mr. R. Athindran (IITM)
Mr. Arnab Roy (IITM)
Mr. Giridhur Sriraman (IITM)
Mr. Deepak Ravikumar (Purdue)
Ms. Reena Elangovan (Purdue)
Ms. Sanchari Sen (Purdue)

The overarching goals of Prof. Anand Raghunathan's research are to improve the efficiency of modern computing platforms by understanding and adopting information processing principles from the brain. This is of particular interest for the growing computing workloads presented by machine learning and data analytics. While artificial intelligence has matched or surpassed natural intelligence in many perceptual tasks, including image recognition, speech recognition and natural language processing, there still exists a large energy gap (three-to-four orders of magnitude) between computers running machine learning algorithms and their biological counterparts. Thus, a long-term goal of this research is to "perform brain-like functions with closer to brain-like efficiency". The scope of this research spans various layers of the computing stack, from software to hardware architecture and circuits.

1. Improving the efficiency of neural networks through dynamic variable effort deep networks (Sanjay Ganapathy, Swagath Venkataramani, Balaraman Ravindran, Anand Raghunathan)

Deep Neural Networks (DNNs) have advanced the state-of-the-art in a variety of machine learning tasks and are deployed in increasing numbers of products and services. However, the computational requirements of training and evaluating large-scale DNNs are growing at a much faster pace than the capabilities of the underlying hardware platforms that they are executed upon. We propose Dynamic Variable Effort Deep Neural Networks (DyVEDeep) to reduce the computational requirements of DNNs during inference. Previous efforts propose specialized hardware implementations for DNNs, statically prune the network, or compress the weights. Complementary to these approaches, DyVEDeep is a dynamic approach that exploits the heterogeneity in the inputs to DNNs to improve their compute efficiency with comparable classification accuracy. DyVEDeep augments DNNs with dynamic effort mechanisms that, in the course of processing an input, identify how critical a group of computations are to classify the input. DyVEDeep dynamically focuses its compute effort only on the critical computa- tions, while skipping or approximating the rest. We propose 3 effort knobs that operate at different levels of granularity viz. neuron, feature and layer levels. We build DyVEDeep versions for 5 popular image recognition benchmarks - one for CIFAR-10 and four for ImageNet (AlexNet, OverFeat and VGG-16, weight-compressed AlexNet). Across all benchmarks, DyVEDeep achieves 2.1x-2.6x reduction in the number of scalar operations, which translates to 1.8x-2.3x performance improvement over a Caffe-based implementation, with < 0.5% loss in accuracy.

2. Fovea-inspired object detection in video (R. Athindran, Giridhur Sriraman, Balaraman Ravindran, Anand Raghunathan)

Object tracking in video is an important machine learning task with several applications involving video summarization, search and autonomous driving. The computational requirements of object tracking are dominated by the object detection step, which processes each frame using a deep neural network and overlooks the considerable temporal correlation between successive frames. We are developing a technique that extracts "foveal" regions from a frame based on object locations in the previous frame, and stitches the extracted regions into a frame of smaller size that can be processed by a lower-complexity deep network. A full input frame is processed periodically to detect new objects that were not present in previous frames. Preliminary results on video datasets suggest that the proposed technique is able to reduce the computational requirements of object detection by ~30%, while maintaining competitive accuracy.

3. An event-driven processor for spiking neural networks (Arnab Roy, Swagath Venkataramani, Neel Gala, Sanchari Sen, Kamakoti Veezhinathan, Anand Raghunathan)

Spiking neural networks (SNNs) are a class of neural network models that offer increased realism in the simulation of biological neurons. From the machine learning perspective, SNNs have the potential to reduce hardware complexity since neuron outputs are only single bits (in hardware, this eliminates the need for multipliers to perform synaptic weighting), and since they utilize different, and potentially more efficient, information representations based on time. However, evaluating large-scale SNNs (e.g., of the scale of the visual cortex) on power-constrained systems requires significant improvements in computing efficiency. A unique attribute of SNNs is their event-driven nature-information is encoded as a series of spikes, and work is dynamically generated as spikes propagate through the network. Therefore, parallel software implementations of SNNs on multi-cores and graphics processing units (GPUs) are severely limited by communication and synchronization overheads. Recent years have seen great interest in deep learning accelerators, however, these architectures are not well suited to the dynamic, irregular parallelism in SNNs. We propose PEASE, a Programmable Event-driven processor Architecture for SNN Evaluation. PEASE comprises of Spike Processing Units (SPUs) that are dynamically scheduled to execute computations triggered by a spike. Instructions to the SPUs are dynamically generated by Spike Schedulers (SSs) that utilize event queues to track unprocessed spikes and identify neurons that need to be evaluated. The memory hierarchy in PEASE is fully software managed, and the processing elements are interconnected using a two-tiered bus-ring topology matching the communication characteristics of SNNs. We propose a method to map any given SNN to PEASE such that the workload is balanced across SPUs and SPU clusters, while pipelining across layers of the network to improve performance. We implemented PEASE at the RTL level and synthesized it to IBM 45 technology. Across 6 SNN benchmarks, our 64-SPU configuration of PEASE achieves 7.1×-17.5× and 2.6×-5.8× speedups, respectively, over software implementations on an Intel Xeon E5-2680 CPU and NVIDIA Tesla K40C GPU. The energy reductions over the CPU and GPU are 71×-179× and 198×-467×, respectively.

4. Exploiting sparsity to accelerate deep neural networks on general-purpose processors (Vinod Ganesan, Neel Gala, Sanchari Sen, Kamakoti Veezhinathan, Anand Raghunathan)

The computational demands posed by DNNs have commonly been addressed through the design of hardware accelerators. To achieve good performance, these accelerators employ large numbers of processing elements and considerable on-chip memory, leading to an area and cost that is prohibitive in applications such as low-cost wearable devices and IoT sensors. Therefore, accelerating DNNs on these resource-constrained systems requires new approaches. We propose lightweight extensions to improve the performance of DNNs on General-Purpose Processors (GPPs) by exploiting sparsity in different DNN data-structures, viz., features, weights and backpropagated errors. Sparsity in DNNs can be both static or dynamic depending on whether the zero values remain constant or vary across different inputs to the network. Sparsity in weights, introduced by pruning connections in the network after training, is static in nature. In contrast, feature and error sparsities, caused by the thresholding nature of the ReLU (Rectified Linear Unit) activation functions, are dynamic in nature. Across 6 state-of-the-art image-recognition DNNs, dynamic sparsity results in 40-70% of the computations being rendered redundant, presenting a significant opportunity for improving performance.

We are exploring two complementary directions to exploit sparsity. First, we proposed Sparsity-aware Core Extensions, a set of ISA and micro-architectural extensions that enable the dynamic detection of zero operands and skipping of future instructions that use them. Our design ensures that the instructions to be skipped are prevented from even being fetched, as squashing instructions after they have partially executed incurs a penalty. The extensions incur just 1.04% area overhead over an ARM Cortex A35 embedded processor. Across 6 state-of-the-art DNNs for both training and inference, the proposed Sparsity-aware Core Extensions achieve 19%-31% reduction in execution time. Our ongoing work explores utilizing sparsity to improve the efficiency of the memory hierarchy, by preventing zero values from being loaded into or occupying parts of the cache. This involves augmenting the cache with a special hardware structure that simply keeps track of the addresses that contain zero values in a compact form, rather than store the zeros explicitly in the cache.