QnA Generation Limitations
Knowledge Artifacts created by importing a source document or Web URL go through the Semantic Analysis process of QnA pair generation and metadata identification. These QnA pairs are created as child Artifacts and are linked to the Parent Knowledge artifact. QnA Pair generation is highly dependent on how the QnA generator component in Luma Knowledge processes the source document and URLs. If QnA generation fails, the Knowledge artifact is also not created in Knowledge Base.
There are a few important considerations for successful QnA generation:
Importing a Document
A Document being uploaded into Luma Knowledge should follow basic formatting for the QnA generator to process and generate QnA pairs. QnA generator identifies sections and subsections, and relationships in the file based on visual clues such as font size, font style, numbering.
Luma Knowledge supports markdown format to bring rich text capabilities to your content. Use Markdown formatting to create rich text content while adding or editing your content.
QnA Extraction works best on documents with a Table of Content or an index page and a clear structure with hierarchical headings. When Luma Knowledge processes the file, it extracts the headings and subheadings as questions and the subsequent content as answers.
Documents without an Index or table of content can also be used to create artifacts, provided they have a clear structure and layout.Â
When uploading a FAQ document, it should be in the form of alternating Questions and Answers per line, one question per line followed by its answer in the following line.
Luma Knowledge also supports structured .txt and .xls files to build Knowledge Base. The files could be in plain text, Rich text, or HTML. Question-Answer pair should be added in the same row with column names ‘Question’ and ‘Answer.’ Any additional columns are ignored. For example:
Â
Follow the below formatting tips when creating a document for upload:
Use headings and sub-headings to denote hierarchy. For example, You can h1 to denote the parent QnA and h2 to denote the QnA that should be taken as a prompt.
Use a smaller heading size to denote subsequent hierarchy.
Don't use style, color, or some other mechanism to imply structure in your document. Luma Knowledge does not extract the multi-turn prompts.
The first character of the heading must be capitalized.
Do not end a heading with a question mark (?).
Luma Knowledge does not support images in QnA pairs.
Importing a Web URL
You can use Web URLs in Luma Knowledge to create Knowledge Artifacts. The QnA generator can process FAQ URLs to generate QnA pairs. Following are a few recommendations to successfully process a Webpage and generate QnA pairs:
Use a Simple FAQ page where the answers immediately follow the questions on the same page. For example, a page that follows a similar format:
Â
Luma Knowledge also supports FAQ pages with links, where questions are aggregated together and are linked to answers that are either in different sections of the same page or on different pages.
Â
You can also use a Topic page to generate QnA pairs. These are the pages where each topic is linked to a corresponding set of questions and answers on a different page. QnA generator crawls through all the linked pages to extract the corresponding questions and answers.
You can also upload semi-structured support web pages to create artifacts, such as web articles that describe how to perform a given task, how to diagnose and resolve a given problem. Extraction works best for simple pages that are well structured and do not contain complex headers/footers.
Other Limitations
File names of the documents being uploaded should not include Single or Double quotes.
Ensure that your document does not exceed the maximum file size configured for your tenant. Refer to Tenant Configurations for more information.
When using a Topic page to create artifacts and generate QnA pairs, the maximum number of deep links that can be crawled for extraction of QnAs from a URL page is 20.
Following are the size limits:
Maximum character count for question text: 1,000 characters
Maximum character count for answer text: 25,000 characters
Maximum character count for file name: 200 characters
Supported file formats: ".pdf", ".txt", ".docx", ".xlsx".
Maximum number of alternate questions: 300
Maximum character count for URL/HTML page: 1 million characters
Â