Case Study

Astoria Metadata Management

metadata-2.png

Current production systems lack the tools to manage work products and data assets in an ever-changing data environment. Accuracy, completeness, reliability, relevance, and timeliness of data is critical to achieve the level of data quality necessary for Artificial Intelligence (AI) applications and empower decision makers.

 

Data producers use a series of disconnected/unintegrated legacy systems in combination with collections of commercial off-the-shelf (COTS) products, which together fall short of the optimum intuitive and efficient nature required for mission effectiveness in the modern era. Analysts struggle to discover and reuse assets and are challenged by the lack of data-centricity and authoritative data standards necessary to create new products. Strategic decision makers lack a 360-degree view of data across the enterprise to support rapid informed decisions.

What we delivered

Using state-of-the-art technologies, the Astoria system provides the unique capabilities and infrastructure to manage (create/tag/track) data of legacy systems and new assets and products enabling discovery and reuse across the enterprise. Astoria’s infrastructure facilitates cataloging and reconciling data from any input source while providing an automated citation generation capability. The data tracking and discovery component enhances critical systematic decision making for data management strategies. 

The Astoria system also tracks the provenance of data, ensuring the trust worthiness of source systems can be factored into decision making processes. Using a generic data schema Astoria enables data model automation and gives analysts the ability to catalog data from the any data source. The manual upload functionality provides users with the capability to store unstructured data in the system making it discoverable across the enterprise.

 

The automated citation tool ensures traceability and the metadata tagging ensures the data is discoverable. Astoria provides an extendable data management capability that harvests disparate data from across the enterprise, supports interactive and model driven enrichment and exposes data via varying means and services for on-demand consumption. Astoria enables metadata enrichment via modifications additions and pulls from external sources while allowing for analysis of the lineage and provenance of products. Products (not currently in any repository) can be manually uploaded and citations automatically created.

Impact

Unlike most centralized data platforms that that seek to unify multiple sources of data and establish a shared methodology, Astoria allows the data to be enriched for the analysts to provide more value. Via the Astoria UI, an analyst or decision maker using the Astoria system can review and edit the metadata for products and add comments and ratings to those products to facilitate collaboration, discovery, and reuse. This empowers users to efficiently provide and gather insights from data on an enterprise-wide basis. This approach embraces analyst collaboration to enrich data usefulness through commenting, reviewing, sharing and approval capabilities enabling the implementation of enhanced search capabilities to improve analysts’ data management and provide faster data insights for the warfighter.

 

Trust is a key when a decision maker is accessing data for making timely and accurate decisions. Astoria provides the infrastructure to create rating and confidence levels during data ingest and analysis. The current system design allows for the capability to provide users with the means to define automatic rules for identifying data. It can also provide the capability for users to know how much they can trust the data by indicating what changes have been made to the data over time.

The bright future

Astoria system will intelligently add contextual metadata tags, and keyword identifiers to data entities tracked in the system metadata catalogue. Within the future business rules module Omni will provide the following capabilities: automatic tagging of data assets, bulk tagging, commenting, and rating, and ability to view history of applied rules. Omni will be continuously working with relevant users and stakeholders to identify the use of AI/ML algorithms to provide recommended business rules for improved classification and search. Omni will use data analytics in order to provide power users with areas where business rule implementation can lead to efficiency gains.