<p>A Perspective for OEMs</p>
As technology evolves and greater computing power is made available in a more efficient and accessible manner, we can identify several areas that present themselves as potential ground for adopting Gen AI and LLM-based approaches. However, there are two very important aspects to be considered while considering LLM’s.
Choosing the right LLM is a very serious activity and this requires a very good understanding of the proposed model as well as the areas where it will be used. One needs to be very detailed about this research. Our work in this area shows that there is a very clear difference in performance timings and accuracies at a subtask level when one compares different LLMs. Table 1 illustrates some of our findings in this direction.
”Subjective Score” is human evaluation score after comparing output from ChatGPT & listed LLMs on various generic NLP Tasks
Our teams created comparison charts for various NLP tasks with popular, open source LLMs that can be deployed on moderate GPU configurations. We found that:
Smaller model (Flan-T5) perform well with non-generative tasks such as "Information Extraction" , "Text Classification," and "Language Translation" at acceptable levels, while
LLAMA2-7B performed well on most of NLP tasks except Arithmetic reasoning and translation.
The second area of importance is around ensuring that LLMs are trained properly for improved accuracy. We need be remember here that LLMs are particularly effective in capturing the complex relationships between words and phrases in natural language, and can achieve significant results across a variety of NLP tasks. The training of LLMs is data-intensive, and it requires significant computing resources and specialized algorithms to process large amounts of data.
Table 2 (below) illustrates the evaluation of LLM performance using foundation models. Our teams created a comparison chart for summarization and Q&A task with popular LLMs that can be deployed on moderate GPU configurations. The focus was on striking a balance between inference time and accuracy, and we found LLAMA2 8 bit as being more memory efficient with acceptable accuracy.
Subjective Score” is human evaluation score after comparing output from ChatGPT & listed LLMs on various generic NLP Tasks
As indicated earlier, improving the accuracy of LLMs can have a significant impact on the performance of natural language processing applications, including helping improve the accuracy of machine translation systems. This can have a significant impact on cross-border communication and global businesses, besides helping improve the accuracy of search and text classification systems, and streamlining and revitalizing information retrieval and knowledge discovery paradigms.
As LLM’s continue to evolve rapidly, another area where we can effectively leverage the capabilities unlocked by these models is in the development of vision computing for enabling Autonomous Driving/Autonomous Driving Assistance Systems (AD/ADAS). This includes using Segment Anything Models (SAM) for image segmentation, or leveraging Contrastive Language-Image Pre-training (CLIP) or Florence for image classification, segmentation, and generation. The span of these models is clear from the size of the data sets used, ranging from about 100 million parameters for SAM to about 1 billion for Florence. Figure 2 illustrates the Foundation Models for Vision Computing.
Figure 2: Foundation Model for Vision Computing: AD/ADAS
All these models are significantly better than Classical Dense CNN/GAN models in most of the tasks needed for enabling the next level of AD/ADAS performance.