Chat with your large data sets using Azure Cognitive Search, Azure Open AI, and ChatGPT
In today's blog, we'll cover how you can leverage Azure Cognitive Search, Azure Open AI, and ChatGPT to filter large amounts of structured and unstructured data.
Introduction
ChatGPT, OpenAI's advanced language model, is an incredibly powerful tool when it comes to answering queries or generating human-like text. But what if you want to interact with a lot of your own data, perhaps data specific to your organization and not necessarily available on the Internet? This is where Azure Cognitive Search comes in, which can help you index and search your data using advanced AI models.
In this guide, we'll explain how you can use these powerful tools to create a web app that lets you chat with your own data.
How does it work
Here is an overview of the system architecture we are going to implement:
- It begins by asking a question through a web application interface.
- The question is then forwarded to ChatGPT, which generates a query.
- This query is sent to Azure Cognitive Search, which fetches relevant data from your data sets.
- This data is returned to the ChatGPT model, which then generates an answer to the initial question based on this relevant data.
- You then receive an answer to your question from the ChatGPT model.
Setting up your environment
To get started, you must have Azure OpenAI and Azure Cognitive Search enabled in your Azure subscription. Once these services are ready, you can take advantage of this demo repositoryAzure Search OpenAI Demo, which will establish the necessary infrastructure for you.
The repository displays various resources including
- Azure OpenAI Service
- Azure Cognitive Search
- Azure Application Service Plan
- Azure Application Service
- Azure Forms Recognizer
- Azure Blob Storage, with some PDFs as sample data for the demo.
You can deploy the repository via GitHub Codespaces or VS Code Remote Containers. For the purposes of this guide, we will be using GitHub codespaces.
Personalization of your data
Before deploying the repository, you may want to replace the sample data with your own. To do this:
- Ve a la
data
folder in the repository. - Delete existing PDF files.
Upload your own PDF files by right-clicking on thedata
folder and selectingMove up
.
Remember that the data that you upload here are the ones that you can consult through the web application.
Customizing your ChatGPT ad
In itlaptop
folder, there is a notebook calledChat-Retrieve-Refine.ipynb
. This notebook contains the prompt that you will use for your ChatGPT model. By default, the prompt is set to ask questions about a health care plan, which is related to the sample data provided. However, you can customize the notice to suit your needs.
Deployment of the repository
Once you've set up your data and notice, it's time to deploy the repository.
- Open GitHub Codespaces and start a new environment using the
azd init azure-search-openai-demo
domain.
- Define an environment name, select your subscription, and specify the location.
- Run
up to azd
to deploy the project.
The deployment creates a new resource group and deploys all the services, including the ChatGPT model. This process can take several minutes.
Let's validate the ChatGPT Module used in Azure Open AI to process the Data of our company, to do so we have to jump to theAzure AI Studioportal (https://oai.azure.com/portal) and look in the modules section.
As shown in the screenshot above, we have ChatGPT 3.5 Turbo integrated into our environment.
Congratulations! You now have a fully functional web app where you can chat with your data using Azure Cognitive Search and ChatGPT. The app gets relevant information from your data set and generates an appropriate response using ChatGPT. This can be a powerful tool for organizations dealing with large and complex data sets.
The benefits of integrating ChatGPT and Azure Cognitive Search
The integration of ChatGPT and Azure Cognitive Search offers a number of benefits for businesses, including:
- Improved data analysis:ChatGPT can be used to analyze data in a more sophisticated way than traditional methods. For example, it can be used to identify patterns and trends in data and generate insights that would be difficult or impossible to find with other methods.
- Increased efficiency:ChatGPT can automate many of the tasks involved in data analysis, such as data cleaning and preparation. This can free up employees to focus on more strategic and value-added activities.
- Reduced costs:ChatGPT can help companies reduce the costs associated with data analysis. For example, it can be used to replace expensive data analysts and consultants.
- Improved decision making:ChatGPT can help companies make better decisions by giving them access to information that would otherwise not be available. This can lead to greater efficiency, profitability, and customer satisfaction.
Use cases for ChatGPT and Azure Cognitive Search
ChatGPT and Azure Cognitive Search can be used in a variety of use cases, including:
- Customer service:ChatGPT can be used to provide customer service by answering questions, resolving issues, and providing support.
- Sales and Marketing:ChatGPT can be used to generate leads, qualify prospects, and close deals.
- Product development:ChatGPT can be used to collect customer feedback, identify new product opportunities, and improve existing products.
- Risk management:ChatGPT can be used to identify and mitigate risks and to ensure compliance with regulations.
- Fraud detection:ChatGPT can be used to detect fraud and other malicious activities.
- Compliance:ChatGPT can be used to ensure compliance with regulations, such as those governing privacy and data protection.
Microsoft Azure OpenAI service and data privacy
Here are some common questions and answers related to data, privacy, and security for Azure OpenAI Service (ChatGPT):
- Q: What type of data does the Azure OpenAI Service process?
A: Azure OpenAI processes user-submitted requests, service-generated completions, user-provided training and validation data, and training process results data. - Q: How does Azure OpenAI Service use and store this data?
A: The service uses this data to deliver its services, monitor misuse, and maintain the quality and security of its services. It does not store hints or completions in the model during operations. - Q: Are there any data privacy issues with using Azure Open AI (ChatGPT)?
A: Azure OpenAI does not use customer data to retrain models, customer-provided training data is only used to tune the customer's model, and is not used by Microsoft to train or improve any Microsoft models. Also, you can use the encryption keys to protect your data. CMK encrypts all customer data stored at rest in the Azure OpenAI service (such as data uploaded for tuning)Microsoft quote:Indications and endings. Azure OpenAI Service can temporarily store request and completion data in the same region as the resource for up to 30 days. This data is encrypted and can only be accessed by authorized Microsoft employees for (1) debugging purposes in case of failure and (2) investigation of patterns of abuse and misuse to determine if the service is being used in a way that violates applicable regulations. product terms. Note: When a customer is approved for modified abuse monitoring, notice and termination data is not stored and therefore Microsoft employees do not have access to the data.
- Q: What mechanisms does Azure OpenAI have for data privacy and security?
A: Azure OpenAI uses encryption, content filtering, and temporary storage (up to 30 days) of notices and terminations to monitor for misuse. It also uses customer-managed keys for data encryption. - Q: Can customers opt out of the registration and human review process?
A: Yes, Microsoft allows customers who meet additional limited access eligibility criteria to request to modify Azure OpenAI content management features. - Q: Does Azure OpenAI use customer data to train its models?
A: No, Microsoft does not use customer data to train, retrain, or improve models in Azure OpenAI Service. - Q: Is customer data processed by Azure OpenAI sent to OpenAI?
A: No, all customer data sent to Azure OpenAI remains within the Azure OpenAI service. - Q: What if Microsoft needs to access customer data?
A: In rare cases where Microsoft personnel need to access customer data, the Customer Lockbox feature provides an interface for customers to review and approve or deny these requests. - Q: Is customer data logged with content filtering?
A: No, content filtering works differently than abuse monitoring, and you don't need to record or store any data.
These answers provide a brief overview, but you can find more detailed information atMicrosoft security, privacy and data processing documents for Azure OpenAI.
Azure OpenAI security layers: overview
Conclusion
The integration of ChatGPT and Azure Cognitive Search offers a powerful and versatile solution for companies looking to improve their data analysis capabilities. By combining the strengths of these two technologies, companies can gain a competitive advantage by making better decisions, improving customer service, and driving innovation.