Documentation

Core Technology for Parsing SQL Across Multiple Vendors: QSI (Query Structural Interface)

evan

November 22, 2024

Core Technology for Parsing SQL Across Multiple Vendors: QSI (Query Structural Interface)

Introduction

The number of businesses worldwide required to comply with global regulations such as CCPA, GDPR, and ISO 27001 is rapidly growing. This shift in the business environment has elevated consistent data management and security to essential elements of corporate operations. In practice, database systems commonly involve access through diverse platforms and query languages. However, differences in syntax and behavior across databases make consistent access control and monitoring a challenging task. Today, we will introduce a solution that consolidates queries from multiple databases into a unified abstract syntax structure and provides a solid foundation for database query security through real-time analysis.

Step.1 Identifying the Problem: Differences in Database Syntax Create Management Complexity!

Large enterprises and data-driven organizations often use a combination of traditional SQL databases such as MySQL, SQL Server, PostgreSQL, and Oracle, alongside NoSQL databases like MongoDB. The variation in query syntax and behavior across these databases makes it difficult to consistently enforce the same security policies. As databases scale and grow more complex, the difficulty in controlling queries and managing security increases, eventually leading to potential security threats. This complexity also acts as a barrier to meeting the requirements of data protection regulations such as GDPR and CCPA.

Here are examples of queries that limit the number of rows across different databases:

MySQL

SELECT * FROM {table} LIMIT 10

SQL Server

SELECT TOP 10 * FROM {table}

Oracle

SELECT * FROM {table} FETCH FIRST 10 ROWS ONLY

MongoDB

db.getCollection('{table}').limit(10);

That is Why Query Control Matters.

More than half of today’s data breaches result from insider threats or insufficient access control. For example, when unauthorized users access highly privileged data, traditional access control systems often fail to block such attempts effectively. Without proper query management, data leaks or malicious misuse become far more likely. Even more concerning, employees or analysts executing large-scale queries that include sensitive data can lead to unpredictable results, leaving critical company assets vulnerable to significant risks.

Step 2. Setting the Goal: Find a Solution to Meet Global Compliance Requirements!

To address these challenges, technology that enables consistent security policies across diverse databases is essential. This forms the foundation for implementing information security management systems as outlined by ISO 27001 and meeting the data protection and privacy requirements of regulations like GDPR and CCPA. QueryPie has focused on developing core technologies to meet these global compliance demands effectively.

3 Key Approaches of QueryPie for Unified Security Policies

To implement consistent security policies across diverse databases, QueryPie proposes three key approaches:

Integration of Database Queries into a Unified AST: QueryPie consolidates query syntax from various databases such as MySQL, SQL Server, Oracle, and MongoDB into a single Abstract Syntax Tree (AST). This allows for consistent processing and analysis of queries regardless of the database platform.

Real-Time Query Analysis and Execution Prediction: QueryPie analyzes the AST in real-time while referencing database schema information to predict query execution outcomes. This enables the proactive identification of the impact of queries executed by users or applications.

Detailed Analysis of Relationships Between Objects: QueryPie conducts an in-depth analysis of the relationships between database objects such as tables and columns included in queries. This helps trace data flow, detect unauthorized access to sensitive data, and mitigate insider threats effectively.

Technical Explanation

Execution Phases

The execution process is divided into four stages: INPUT, RAW AST, QSI AST, Analysis Result

First, in the INPUT phase, the query is received and parsed into a RAW AST, identifying its syntax structure.

Next, in the QSI AST phase, the syntax tree is transformed into a semantic tree, defining the logical relationships between each query element.

Finally, in the Analysis Result phase, the analyzed information is used to generate the final output, providing a model of how the query results will be structured.

INPUT

The query is received as raw input from the database.

SELECT * FROM sakila.actor

RAW AST

The RAW AST represents the output from parsing the input query using a raw parser. The raw parser is implemented using well-known parser generators such as. ANTLR, YACC, or JavaCC. Since each database has unique syntax, the resulting AST structure varies across databases. For example, even with the same parser generator, SELECT 1 in MySQL and SELECT 1 FROM DUAL in Oracle produce semantically different results.

Here are examples of SELECT * FROM actor queries across different databases:

MySQL (ANTLR4)

Oracle (ANTLR4)

PostgreSQL (YACC)

Building a Transaction-Free, Flawless CDC (Change Data Capture) System

Shell Native Command Control through SSH Proxy Architecture