- This article provides a comprehensive guide on the essential principles, methodologies, and best practices for implementing generative AI solutions in large-scale enterprise environments.
- It covers key components of Gen AI architecture, such as vector databases, embeddings, and prompt engineering, offering practical insights into their real-world applications.
- The article explores prompt engineering techniques in detail, discussing how to optimize prompts for effective generative AI solutions.
- It introduces Retrieval Augmented Generation (RAG), explaining how to decouple data ingestion from data retrieval to enhance system performance.
- A practical example using Python code is included, demonstrating how to implement RAG with LangChain, Chroma Database, and OpenAI API integration, providing hands-on guidance for developers.
Last year, we saw OpenAI revolutionize the technology landscape by introducing ChatGPT to consumers globally. This tool quickly acquired a large user base within a short period, surpassing even popular social media platforms. Powered by Generative AI, a form of deep learning technology, ChatGPT impacts consumers and is also being adopted by many enterprises to target potential business use cases that were previously considered impossible challenges.
Overview of Generative AI in Enterprise –
A recent survey conducted by BCG with 1406 CXOs globally revealed that Generative AI is among the top three technologies (after Cybersecurity and Cloud Computing) that 89% of them are considering investing in for 2024. Enterprises of all sizes are either building their in-house Gen-AI products or investing to add the Gen-AI line of product to their enterprise asset list from external providers.
With the massive growth of Gen-AI adoption in enterprise settings, it is crucial that a well-architected reference architecture helps the engineering team and the architects identify roadmaps and building blocks for building secure and compliant Gen-AI solutions. These solutions not only drive innovation but also elevate stakeholder satisfaction.
Before we deep dive, we need to understand what is Generative AI? To understand Generative AI, we first need to understand the landscape it operates in. The landscape starts with Artificial Intelligence (AI) which refers to the discipline of computer systems that tries to emulate human behavior and perform tasks without explicit programming. Machine Learning (ML) is a part of AI that operates on a huge dataset of historical data and makes predictions based on the patterns it has identified on that data. For example, ML can predict when people prefer staying in the hotels vs staying in the rental homes through AirBNB during specific seasons, based on the past data. Deep Learning is a type of ML that contributes toward the cognitive capabilities of computers by using artificial deep neural networks, similar to the human brain. It involves layers of data processing where each layer refines the output from the previous one, ultimately generating predictive content. Generative AI is the subset of Deep Learning techniques that uses various machine learning algorithms and artificial neural networks to generate new content, such as text, audio, video, or images, without human intervention based on the knowledge it has acquired during training..
Importance of Secure and Compliant Gen-AI solutions –
As Gen-AI becomes the emerging technology, more and more of the enterprises across all the industries are rushing to adopt the technology and not paying enough attention to the necessity of practicing to follow Responsible AI, Explainable AI and the compliance and security side of the solutions. Because of that we are seeing customer privacy issues or biases in the generated content. This rapid increase of GEN-AI adoption requires a slow & steady approach because with great power comes greater responsibility. Before we further explore this area I would like share couple of examples to show why
Organizations must architect the GEN-AI based systems responsibly with compliance in mind, or they can risk losing public trust on their brand value. Organizations need to follow a thoughtful and comprehensive approach while constructing, implementing, and regularly enhancing the Gen-AI systems as well as governing their operation and the content being produced.
Common Applications and Benefits of Generative AI in Enterprise settings
Technology focused organizations can utilize the real power of Gen-AI in software development by enhancing productivity and code quality. Gen-AI powered autocompletion and code recommendation features help developers and engineers in writing code more efficiently, while code documentation and generation from natural language comments in any language can streamline the development process. Tech leads can save significant development effort by utilizing Gen-AI to do repetitive manual peer review, bug fixing and code quality improvement. This leads to faster development and release cycles and higher-quality software. Also, conversational AI for software engineering helps enable natural language interactions,which improves the collaboration and communication among team members. Product managers and owners can use Generative AI to manage the product life cycles, ideation, product roadmap planning as well as user story creation and writing high quality acceptance criterias.
Content summarization is another area where Generative AI is the dominating AI technology in use. It can automatically summarize meaningful product reviews, articles, long-form reports, meeting transcripts, and emails, saving time and effort of the analysts. Generative AI also helps in making informed decisions and identifying trends by building a knowledge graph based on the extracted key insights from unstructured text and data.
In customer support, Generative AI powers virtual chatbots that provide personalized assistance to customers, which enhances the overall user experience. For example in the healthcare industry for a patient facing application the chatbots can be more patient oriented by providing empathetic answers. This would help the organization to gain more customer satisfaction. Enterprise intelligent search engines leverage Generative AI to deliver relevant information quickly and accurately. Recommendation systems powered by Generative AI analyze the user behaviors to offer customized suggestions that improves customer engagement and satisfaction. Also, Generative AI enables end-to-end contact center experiences, automating workflows and reducing operational costs. The live agents can use the summarization capability to understand the process or procedures quickly and can guide their customers quickly.
Generative AI has also made significant advancements in content assistance. It can help generate product descriptions, keywords and metadata for e-commerce platforms, create engaging marketing content, and assist with content writing tasks. It can also produce images for marketing and branding purposes by using natural language processing (NLP) to understand and interpret user requirements.
In the area of knowledge research and data mining, Generative AI is used for domain-specific research, customer sentiment analysis, trend analysis, and generating cross-functional insights. It also plays a crucial role in fraud detection, leveraging its ability to analyze vast amounts of data and detect patterns which indicate fraudulent activity.
So we can see that Generative AI is revolutionizing industries by enabling intelligent automation and enhancing decision-making processes. Its diverse applications across software development, summarization, conversational AI, content assistance, and knowledge research shows its true potential in the enterprise landscape. If a business can adopt Generative AI quickly, they are on the path to gain a competitive edge and drive innovation in their respective industries.
As this can be seen that Generative AI has been bringing significant business value to any organization by uplifting the customer experiences of the products or improving the productivity of the workforce. Enterprises who are in the path of adopting the Gen-AI solutions are finding real potential for creating new business processes to drive innovations. The Co-Pilot feature of Gen-AI products or Agents have the ability to do a chain of thought process to make decisions based on the external knowledge such as results from API or services to complete decision making tasks. There are numerous applications across industries.
The below diagram shows some of the capabilities that can be possible using Gen-AI at scale.
The core components of enterprise architecture for Generative AI have many different building blocks. In this section we will quickly touch some of the components such as Vector Database, Prompt Engineering, and Large Language Model (LLM). In the AI or Machine Learning world data is represented in a multidimensional numeric format which is called Embedding or Vector. The Vector Database is crucial for storing and retrieving vectors representing various aspects of data, enabling efficient processing and analysis. Prompt Engineering focuses on designing effective prompts to guide the AI model’s output, ensuring relevant and accurate responses from the LLM. Large Language Models serve as the backbone of Generative AI that utilizes various algorithms (Transformer or GAN etc) and pre-training vast datasets to generate complex and coherent digital content in the form of texts or audio or videos. These components work together to scale the performance and functionality of Generative AI solutions in enterprise settings. We will explore more in the following sections.
Vector Database –
If you have a Data Science or Machine Learning background or previously worked with ML systems, you most likely know about embeddings or vectors. In simple terms, embeddings are used to determine the similarity or closeness between different entities or data, whether they are texts, words, graphics, digital assets, or any pieces of information. In order to make the machine understand the various contents it is converted into the numerical format. This numerical representation is calculated by another deep learning model which determines the dimensions of that content.
Following section shows typical embeddings generated by the “text-embedding-ada-002-v2” model for the input text “Solutioning with Generative AI ” which has the dimension of 1536.
“object”: “list”, “data”: [ { “object”: “embedding”, “index”: 0, “embedding”: [ -0.01426721, -0.01622797, -0.015700348, 0.015172725, -0.012727121, 0.01788214, -0.05147889, 0.022473885, 0.02689451, 0.016898194, 0.0067129326, 0.008470487, 0.0025008614, 0.025825003, . . <so many>… . 0.032398902, -0.01439555, -0.031229576, -0.018823305, 0.009953735, -0.017967701, -0.00446697, -0.020748416 ] } ], “model”: “text-embedding-ada-002-v2”, “usage”: { “prompt_tokens”: 6, “total_tokens”: 6 } }{ |
Traditional databases encounter challenges while storing vector data with high dimensions alongside other data types though there are some exceptions which we will discuss next. These databases also struggle with scalability issues. Also, they only return results when the input query exactly matches with the stored text in the index. To overcome these challenges, a cutting-edge database concept has emerged which is capable of efficiently storing these high dimensional vector data. This innovative solution utilizes algorithms such as K-th Nearest Neighbor (K-NN) or Approximate Nearest Neighbor (A-NN) to index and retrieve related data, optimizing for the shortest distances. These vanilla vector databases maintain indexes of the relevant and connected data while storing and thus effectively scale if the demand from the application gets higher.
The concept of vector databases and embeddings plays a crucial role in designing and developing Enterprise Generative AI applications. For example in QnA use cases in the existing private data or building chatbots Vector database provides contextual memory support to LLMs. For building Enterprise search or recommendation system vector databases are used as it comes with the powerful semantic search capabilities.
There are two primary types of vector database implementations available for the engineering team while building their next AI applications: pure vanilla vector databases and integrated vector databases within a NoSQL or relational database.
Pure Vanilla Vector Database: A pure vector database is specifically designed to efficiently store and manage vector embeddings, along with a small amount of metadata. It operates independently from the data source that generates the embeddings which means you can use any type of deep learning models to generate Embedding with different dimensions but still can efficiently store them in the database without any additional changes or tweaks to the vectors. Open source products such as Weaviate, Milvus, Chroma database are pure vector databases. Popular SAAS based vector database Pinecone is also a popular choice among the developer community while building AI applications like Enterprise search, recommendation system or fraud detection system.
Integrated Vector database: On the other hand, an integrated vector database within a highly performing NoSQL or relational database offers additional functionalities. This integrated approach allows for the storage, indexing, and querying of embeddings alongside the original data. By integrating the vector database functionality and semantic search capability within the existing database infrastructure, there is no need to duplicate data in a separate pure vector database. This integration also facilitates multi-modal data operations and ensures greater data consistency, scalability, and performance. However, this type of database can only support similar vector types, having the same dimension size which has been generated by the same type of LLM. For example pgVector extension converts the PostGres database into a vector database but you can’t store vector data having varying sizes such as 512 or 1536 together. Redis Enterprise version comes with Vector search enabled which converts the Redis noSQL database into a vector database capable. Recent version of MongoDB also supports vector search capability.
Prompt Engineering –
Prompt Engineering is the art of crafting concise text or phrases following specific guidelines and principles. These prompts serve as instructions for Large Language Models (LLMs) to guide the LLM to generate accurate and relevant output. The process is important because poorly constructed prompts can lead to LLMs producing hallucinated or irrelevant responses. Therefore, it is essential to carefully design the prompts to guide the model effectively.
The purpose of prompt engineering is to ensure that the input given to the LLM is clear, relevant, and contextually appropriate. By following the principles of prompt engineering, developers can maximize the LLM’s potential and improve its performance. For example, if the intention is to generate a summary of a long text, the prompt should be formulated to instruct the LLM to condense the information into a concise and coherent summary.
Also, prompt engineering helps to enable the LLM to demonstrate various capabilities based on the input phrases’ intent. These capabilities include summarizing extensive texts, clarifying topics, transforming input texts, or expanding on provided information. By providing well-structured prompts, developers can enhance the LLM’s ability to understand and respond to complex queries and requests accurately.
A typical structure of any well-constructed prompt will have the following building blocks to ensure it provides enough context, time to think for the model to generate quality output –
Instruction & Tasks | Context & Examples | Role (Optional) | Tone (Optional) | Boundaries (Optional) | Output Format (Optional) |
Provide clear instruction and specify the tasks the LLM is supposed to complete | Provide the input context and external information so that the model can perform the tasks. | If the LLM needs to follow a specific role to complete a task, it needs to be mentioned. | Mention the style of writing e.g. you can ask the LLM to generate the response in professional english. | Remind the model of the guardrails and the constraints to check while generating the output. | If we want the LLM to generate the output in a specific format. E.g. json or xml etc. the prompt should have that mentioned. |
In summary, prompt engineering plays a vital role to ensure that LLMs generate meaningful and contextually appropriate output for the tasks it is supposed to do. By following the principles of prompt engineering, developers can improve the effectiveness and efficiency of LLMs in a wide range of applications, from summarizing text to providing detailed explanations and insights.
There are various Prompt Engineering techniques or patterns available which can be utilized while developing the Gen-AI solution. These patterns or the advanced techniques shorten the development effort by the engineering team and streamline the reliability and performance –
- Zero-shot prompting – Zero-shot prompting refers to the type of prompts which asks the model to perform some tasks but it does not provide any examples. The model will generate the content based on the previous training. It is used in simplex straight forward NLP tasks. E.g. sending automated email reply, simple text summarization.
- Few-Shot prompting – In a few shots prompt pattern, several examples are provided in the input context to the LLM and a clear instruction so that the model can learn from the examples and generate the type of responses based on the samples provided. This prompt pattern is used when the task is a complex one and zero-shot prompt fails to produce the required results.
- Chain-Of-Thought – Chain-of-thought (CoT) prompt pattern is suitable in use cases where we need the LLM to demonstrate the complex reasoning capabilities. In this approach the model shows its step-by-step thought process before providing the final answer. This approach can be combined with few-shot prompting, where a few examples are provided to guide the model, in order to achieve better results on complicated tasks that require reasoning before responding.
- ReAct – In this pattern, LLMs are provided access to the external tools or system. LLMs access those tools to fetch the data it needs to perform the task it is expected to do based on the reasoning capabilities. ReAct is used in the use case where we need the LLM to generate the sequential thought process and based on that process retrieves the data it needs by accessing the external source and generates the final more reliable and factual response. ReAct pattern is applied in conjunction with the Chain-Of-Thought prompt pattern where LLMs are needed for more decision making tasks.
- Tree of thoughts prompting – In the tree of thought pattern, LLM uses a humanlike approach to solve a complex task using reasoning. It evaluates different branches of thought-process and then compares the results to pick the optimal solution.
LLM Ops –
LLMOps as the name said refers to the Operational platform where the Large Language Model (another term would be Foundational Model) is available and the inference is exposed through API pattern for the application to interact with the AI or the cognitive part of the entire workflow. LLMOps is depicted as another core building block for any Gen-AI application. This is the collaborative environment where the data scientists, engineering team and product team collaboratively build, train, deploy machine learning models and maintain the data pipeline and the model becomes available to be integrated with other application layers.
There are three different approaches the LLMOps platform can be setup for any enterprise:
- Closed Model gallery: In the Closed models gallery the LLM offerings are tightly governed by giant AI providers like Microsoft, Google, OpenAI, Anthropic or StableDiffusion etc.. These tech giants are responsible for their own model training and maintenance. They manage the infrastructure as well as architecture of the models and also the scalability requirements of running the entire LLMOps systems. The models are available through API patterns where the application team creates the API keys and integrates the models for inference into the applications. The benefits of this kind of GenAI Ops is that the enterprises need not to worry about maintaining any kind of infrastructure, scaling the platform when demand increases, upgrading the models or evaluating the model’s behavior. However, in the closed model approaches the enterprises are completely dependent on these tech giants and have no controls on the type and quality of data which are being used to train or upgrade the training of the LLMs, sometimes the models might experience rate limiting factors when the infrastructure sees huge surge in demand.
- Open Source Models Gallery: In this approach you build your own model gallery by utilizing the Large Language models managed by the Open Source community through HugginFace or kaggle. In this approach enterprises are responsible to manage the entire AI infrastructure either on premise or on cloud. They need to provision the open source models and once deployed successfully the model’s inferences are exposed through API for other Enterprise components to integrate into their own applications. The model’s internal architecture, parameter sizes, deployment methodologies and the pre-training data set are made publicly available for customization by the Open source community and thus enterprises have full control over the access, enforcing moderation layer and control the authorization, but at the same time the total cost of ownership also increases.
- Hybrid approach: Nowadays Hybrid approach is quite popular and major cloud
companies like AWS or Azure and GCP are dominating this space by providing serverless galleries where any organization can either deploy Open Source models from the available repository or use the close models of these companies. Amazon Bedrock and Google Vertex are popular hybrid Gen-AI platforms where either you can do BYOM (Bring Your Own Model) or use the closed model such as Amazon Titan through bedrock console or Google Gemini through Vertex. Hybrid approach provides flexibility for the enterprises to have controls on the access and at the same time it can utilize high quality open source model access in the cost effective way by running the into the shared infrastructure.
RAG is a popular framework for building Generative AI applications in the Enterprise world. In most of the use cases we explored above has one thing in common. In most cases the large language model needs access to external data such as organization’s private business data or articles on business processes and procedures or for software development access to the source code. As you know, the Large Language Models are trained with publicly available scrapped data from the internet. So if any question is asked about any organization’s private data it won’t be able to answer and will exhibit hallucination. Hallucination happens with a Large Language Model when it doesn’t know the answer of any query or the input context and the instruction is not clear. In that scenario it tends to generate invalid and irrelevant responses.
RAG as the name suggests tries to solve this issue by helping the LLM access the external knowledge and data. The various components powering the RAG framework are –
Retrieval – The main objective in this activity is to fetch the most relevant and similar content or chunk from the vector database based on the input query.
Augmented – In this activity a well constructed prompt is created so that when the call is made to the LLM, it knows exactly what output it needs to generate, and what is the input context.
Generation – This is the area when LLM comes into play. When the model is provided with good and enough context (provided by “retrieval”) and has clear steps outlined (provided by the “Augmented” step) , it will generate a high value response for the user.
We have decoupled the data ingestion component with the retrieval part in order to make the architecture more scalable, however one can combine both the data ingestion and the retrieval together for use cases having low volume of data.
Data Ingestion workflow-
In this workflow, the contents from the various data sources such as PDF reports, HTML articles or any transcripts data from conversation are chunked using appropriate chunking strategies e.g. fixed size chunking or context aware chunking. Once chunked the split contents are used to generate embeddings by invoking the appropriate LLMOps your Enterprise has set up – it can be a closed model providing access through API or open source model running in your own infrastructure. Once the embedding is generated it gets stored in a vector database for being consumed by the application running in the retrieval section.
Data Retrieval workflow-
In the data retrieval workflow, the user query is checked for profanity and other moderation to ensure it is free of any toxic data or unbiased content. The moderation layer also checks to ensure the query doesn’t have any sensitive or private data as well. Once it passes the moderation layer, it is converted into embedding by invoking the embedding LLM. Once a question is converted into embedding, this is used to do similarity search in the vector database to identify similar contents. The original texts as well as the converted embedding are used for finding the similar documents from the vector database.
The top-k results are used to construct a well-defined prompt using the prompt engineering and this is fed to the different LLM model (generally the instruct model) to generate meaningful responses for the user. The generated response is again passed through the moderation layer to ensure it doesn’t contain any hallucinated content or biased answer and also free from any hateful data or any private data. Once the moderation is satisfied, the response is shared with the user.
RAG Challenges and Solutions –
RAG framework stands out as the most cost effective way to quickly build and integrate any Gen-AI capabilities to the enterprise architecture. It is integrated with a data pipeline so there is no need to train the models with external content that changes frequently. For use cases where the external knowledge or content is dynamic, RAG is extremely effective for ingesting and augmenting the data to the model. Training a model with frequently changing data is extremely expensive and should be avoided. These are the top reasons why RAG has become so popular among the development community. The two popular gen-ai python frameworks LLamaIndex and LangChain provide out-of-the-box features for Gen-AI development using RAG approaches.
However, the RAG framework comes with its own set of challenges and issues that should be addressed early in the development phase so that the responses we get will be of high quality.
- Chunking Issue: Chunking plays a biggest role for the RAG system to generate effective responses. When large documents are chunked , generally fixed size chunking patterns are used where documents are splitted or chunked with a fixed word size or character size limit. This creates issues when a meaningful sentence is chunked in the wrong way and we end up having two chunks containing two different sentences of two different meanings. When these kinds of chunks are converted into embeddings and fed to the vector database, it loses the semantic meaning and thus during the retrieval process it fails to generate effective responses. To overcome this a proper chunking strategy needs to be used. In some scenarios, instead of using Fixed size chunking it is better to use context aware chunking or semantic chunking so that the inner meaning of a large corpus of documents is preserved.
- Retrieval Issue: The performance of RAG models relies heavily on the quality of the retrieved contextual documents from the vector database. When the retriever fails to locate relevant, correct passages, it significantly limits the model’s ability to generate precise, detailed responses. In some situations the retrievers fetch mixed content having relevant documents along with the irrelevant documents and this mixed results cause difficulties for the LLM to generate proper content as it fails to identify the irrelevant data when it gets mixed with the relevant content. To overcome this issue, we generally employ customized solutions such as updating the metadata with a summarized version of the chunk that gets stored along with the embedding content. Another popular approach is to use the RA-FT (Retrieval Augmented with Fine Tune) method where the model is fine tuned in such a way that is able to identify the irrelevant content when it gets mixed with the relevant content.
- Lost in the middle problem: This issue happens when LLMs are presented with too much information as the input context and not all are relevant information. Even premium LLMs such as “Claude 3” or “GPT 4” which have huge context windows, struggle when it gets overwhelmed with too much information and most of the data is not relevant to the instruction provided by the prompt engineering. Because of overwhelming large input data the LLM couldn’t generate proper responses. The performance and quality of the output degrades if the relevant information is not at the beginning of the input context. This classic and tested problem is considered one of the pain points of RAG and it requires the engineering team to carefully construct both the prompt engineering as well as re-ranking the retrieved contents so that the relevant contents always stay in the beginning for the LLM to produce high quality content.
As you can see, though RAG is the most cost effective and quick to build framework for designing and building Gen-AI applications, it also suffers a lot of issues while producing high quality responses or best results. The quality of the LLM response can be greatly improved by re-ranking the retrieved results from vector databases, attaching summarized contents or metadata to documents for generating better semantic search, and experimenting with different embedding models having different dimensions. Together with those advanced techniques and integrating some hybrid approaches like RA-FT the performance of RAG would be enhanced.
A sample RAG Implementation using Langchain
In this section we will deep dive in building a small RAG based application using Langchain, Chrima database and Open AI’s API. We will be using the Chroma Database as our in-memory Vector database which is a lightweight database for building MVP (Minimal Viable Product) or POC (Proof Of Concept) to experience the concept. ChromaDB is still not recommended for building production grade apps.
I generally use the Google Collab for running any python code quickly. Feel free to use the same or try the following code in your favorite python IDE..
Step 1: Install the python libraries / modules
!pip install langchain !pip install langchain-community langchain-core !pip install -U langchain-openai !pip install langchain-chroma |
- The OpenAI API is a service that allows developers to access and use OpenAI’s large language models (LLMs) in their own applications.
- LangChain is an open-source framework that makes it easier for developers to build LLM applications.
- ChromaDB is an open-source vector database specifically designed to store and manage vector representations of text data.
- Remove the “!” from pip statements if you are directly running the code from your command prompt.
Step 2: Import the required objects
# Import necessary modules for text processing, model interaction, and database management from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chat_models import ChatOpenAI from langchain.prompts import PromptTemplate from langchain.chains import RetrievalQA from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_chroma import Chroma import chromadb import pprint # Description of module usage: |
Step 3: Data Ingestion
input_texts = [ “Artificial Intelligence (AI) is transforming industries around the world.”, “AI enables machines to learn from experience and perform human-like tasks.”, “In healthcare, AI algorithms can help diagnose diseases with high accuracy.”, “Self-driving cars use AI to navigate streets and avoid obstacles.”, “AI-powered chatbots provide customer support and enhance user experience.”, “Predictive analytics driven by AI helps businesses forecast trends and make data-driven decisions.”, “AI is also revolutionizing the field of finance through automated trading and fraud detection.”, “Natural language processing (NLP) allows AI to understand and respond to human language.”, “In manufacturing, AI systems improve efficiency and quality control.”, “AI is used in agriculture to optimize crop yields and monitor soil health.”, “Education is being enhanced by AI through personalized learning and intelligent tutoring systems.”, “AI-driven robotics perform tasks that are dangerous or monotonous for humans.”, “AI assists in climate modeling and environmental monitoring to combat climate change.”, “Entertainment industries use AI for content creation and recommendation systems.”, “AI technologies are fundamental to the development of smart cities.”, “The integration of AI in supply chain management enhances logistics and inventory control.”, “AI research continues to push boundaries in machine learning and deep learning.”, “Ethical considerations are crucial in AI development to ensure fairness and transparency.”, “AI in cybersecurity helps detect and respond to threats in real-time.”, “The future of AI holds potential for even greater advancements and applications across various fields.” ] # Combine all elements in the list into a single string with newline as the separator # Perform “RecursiveCharacterTextSplitter” so that the data can have an object “page_content” chunk_texts = text_splitter.create_documents([combined_text]) |
Step 4: Generate Embedding and store in the Chroma Database
# Initialize the embeddings API with the OpenAI API keyopenai_api_key = “sk-proj-REKM9ueLh5ozQF533c2sT3BlbkFJJTUfxT2nm113b28LztjD” embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key) # Directory to persist the Chroma database # Save the documents and embeddings to the local Chroma database # Load the Chroma database from the local directory # Testing the setup with a sample query # Print the retrieved documents |
Step 5: Now we will do the prompt engineering to instruct the LLM what to generate based on the context we supply.
# Define the template for the prompt template = “”” Role: You are a Scientist. Input: Use the following context to answer the question. Context: {context} Question: {question} Steps: Answer politely and say, “I hope you are well,” then focus on answering the question. Expectation: Provide accurate and relevant answers based on the context provided. Narrowing: 1. Limit your responses to the context given. Focus only on questions about AI. 2. If you don’t know the answer, just say, “I am sorry…I don’t know.” 3. If there are words or questions outside the context of AI, just say, “Let’s talk about AI.” Answer: “”” # {context} is data derived from the database vectors that have similarities with the question # Create the prompt template |
Step 6: Configure the LLM inference and do the retrieval
# Define the parameter values temperature = 0.2 param = { “top_p”: 0.4, “frequency_penalty”: 0.1, “presence_penalty”: 0.7 } # Create an LLM object with the specified parameters # Create a RetrievalQA object with the specified parameters and prompt template # Test the setup with a sample queryquery = “How does AI transform the industry?” # Print the retrieved documents and the response |
Final Output –
[Document(page_content=’Artificial Intelligence (AI) is transforming industries around the world.’), Document(page_content=’\nThe future of AI holds potential for even greater advancements and applications across various fields.’), Document(page_content=’\nIn manufacturing, AI systems improve efficiency and quality control.’), Document(page_content=’\nAI is also revolutionizing the field of finance through automated trading and fraud detection.’)] |
RetrievalQA is a method for question answering tasks that utilizes an index to retrieve relevant documents or text snippets, suitable for simple question-answering applications. RetrievalQAChain combines Retriever and a QA chain. It’s used to fetch documents from the Retriever and then utilize the QA chain to answer questions based on the retrieved documents.
In conclusion, a robust reference architecture is an essential requirement for organizations who are either in the process of building the Gen-AI solutions or are thinking of making the first step. This helps to build the secure and compliant Generative AI solutions. A well-architected reference architecture can help the engineering teams in navigating the complexities of Generative AI development by following the standardized terms, best practices, and IT architectural approaches. It speeds up the technology deployments, improves interoperability, and provides a solid foundation for enforcing governance and decision-making processes. As the demand for Generative AI continues to increase, Enterprises who invest in the development and adhere to a comprehensive reference architecture will be in a better position to meet regulatory requirements, elevate the customer trust, mitigate risks, and drive innovation at the forefront in their respective industries.