Go Back

lakeFS

lakefs.io
Summary

lakeFS provides Git-like version control for data lakes, allowing teams to manage, experiment with, and ensure the quality of large datasets. It offers features such as data branching, atomic data promotion, and integration with various data engineering tools. The platform aims to improve data quality, accelerate development, and enhance collaboration for data-intensive workflows.

Features
0/15
See all

No common features found

Pricing
Freemium
See all

Open Source

$0.00 one time
Popular
  • Format-Agnostic Data Version Control
  • Cloud-Agnostic
  • Zero Clone copy for isolated environment (via branches)
  • Atomic Data Promotion (via merges)
  • Data Stays in One Place
  • Configurable Garbage Collection
  • Data CI/CD Using lakeFS Hooks
  • Integrates with Your Data Stack
  • Role-Based Access Control (RBAC)
  • Run locally

Enterprise

Custom
  • Format-Agnostic Data Version Control
  • Cloud-Agnostic
  • Zero Clone copy for isolated environment (via branches)
  • Atomic Data Promotion (via merges)
  • Data Stays in One Place
  • Configurable Garbage Collection
  • Data CI/CD Using lakeFS Hooks
  • Integrates with Your Data Stack
  • Role-Based Access Control (RBAC)
  • Single Sign On (SSO)
  • SCIM Support
  • IAM Roles
  • Mount Capability
  • Audit Logs
  • Transactional Mirroring
  • Simplified Garbage Collection (Managed or Standalone)
  • SOC2
  • Support SLA
Rationale

lakeFS is a data version control system, akin to Git for data lakes. It focuses on managing large datasets, enabling branching, merging, and rollback capabilities for data engineers and ML teams. It does not offer AI-powered file organization, chat with files, semantic search for general files, or automated sorting rules for typical file management. Its features are geared towards data integrity and collaboration in data pipelines, not general file organization for individuals or small teams.