San Jose Professor Making Indian Classical Music Accessible With Artificial Intelligence
India-West Staff Reporter
SAN JOSE, CA – Music has a profound impact on human lives. Many like Dr. Vishnu S. Pendyala, a faculty member of the department of Applied Data Science at San Jose State University aspire to learn music, particularly Indian classical music, but are kept away from learning, mainly because of the expensive process of mastery of the skill, he says. It takes years of rigorous practice under the tutelage of an established musician to perfect vocal music. Since his childhood, Pendyala made multiple attempts at learning Carnatic vocal, but in vain.
After many years now, technological advances helped orient his research in deep learning, which is an active area of Artificial Intelligence (AI), towards his passion for music. He and his students started doing experiments in music using cutting-edge inventions.
Audio signals that appeal to the ear and create melody usually conform to a structural framework. Audio is one of the four data types that deep learning works best on, the other three being image, video, and text. Pendyala and his student teams did experiments on audio signals, particularly with respect to latent melodic frameworks within them.
His undergrad students, Rohan Surana and Aakash Varshney worked with him to use a deep learning framework called CycleGAN to convert Indian classical melodic frameworks in South Indian Carnatic style to those in North Indian Hindustani style and vice versa. The work was published in Springer Proceedings of Second International Conference on Advances in Computer Engineering and Communication Systems earlier this year.
The system they developed takes as input, unpaired samples of both styles of music – Hindustani and Carnatic and learns to convert a new sample in one style to the other. CycleGAN is one kind of a Generative Adversarial Network, abbreviated as GAN. GAN is the same technology that is used to generate fictitious real-looking images.
Generative models have become extensively popular in recent times to accomplish a wide variety of tasks and the area is still evolving. A GAN has two components – a generator and a discriminator. In the context of fictitious image generation, the generator is a software program that creates the fictitious images by taking feedback from its adversary, the discriminator. The generator behaves like a child or a student and the discriminator functions as the parent or the teacher making the child learn.
A more useful application of deep learning in music that Pendyala wanted to experiment with was to build an inexpensive music tutor that can help the world learn Indian classical music. His graduate students, Nupur Yadav, Chetan Kulkarni, and Lokesh Vadlamudi worked with him to develop a deep learning system to recognize the melodic frameworks in the vocal rendering of amateurs and play the perfected snippet for the same melodic framework so that amateurs can improve their rendering by listening to it.
The system was deployed using software technologies like containerization, orchestration, and cloud computing as a proof of concept for possible access to the masses. The system comes close to building a music tutor and is appropriately titled in the paper that has been published in a Scopus indexed Elsevier journal of systems and soft computing accessible at https://www.sciencedirect.com/science/article/pii/S2772941922000084?via%3Dihub