Go Back

lakeFS

lakefs.io
Summary

lakeFS is a data version control system that applies Git-like operations (branching, committing, merging, reverting) to data lakes. It aims to improve efficiency, collaboration, and reproducibility for machine learning and data engineering projects by providing isolated development environments, data lineage, and the ability to roll back data changes.

Features
8/15
See all

Must Have

2 of 5

Cloud Storage Integration

Privacy Controls

AI File Chat

Semantic Search

Automated Sorting Rules

Other

6 of 10

Manual Approval Workflow

Local File Access

Multi-User Collaboration

Enterprise SSO & Compliance

Centralized Team Billing

Data Encryption & Security

Feedback-Driven Refinement

Demo Mode

Usage Credits & Quotas

Advanced AI Model

Pricing
Tiered
See all

Open Source

Custom
Popular
  • Format-Agnostic Data Version Control
  • Cloud-Agnostic
  • Zero Clone copy for isolated environment (via branches)
  • Atomic Data Promotion (via merges)
  • Data Stays in One Place
  • Configurable Garbage Collection
  • Data CI/CD Using lakeFS Hooks
  • Integrates with Your Data Stack
  • Role-Based Access Control (RBAC)
  • Run locally

Enterprise

Custom
Popular
  • Format-Agnostic Data Version Control
  • Cloud-Agnostic
  • Zero Clone copy for isolated environment (via branches)
  • Atomic Data Promotion (via merges)
  • Data Stays in One Place
  • Configurable Garbage Collection
  • Data CI/CD Using lakeFS Hooks
  • Integrates with Your Data Stack
  • Role-Based Access Control (RBAC)
  • Single Sign On (SSO)
  • SCIM Support
  • IAM Roles
  • Mount Capability
  • Audit Logs
  • Transactional Mirroring
  • Simplified Garbage Collection (Managed or Standalone)
  • SOC2
  • Support SLA
Rationale

lakeFS is a data version control system for data lakes, offering Git-like operations for data. While it provides versioning, branching, and merging for data, which can help with reproducibility and collaboration in ML/AI workflows, it does not explicitly offer AI-powered file organization features like AI file chat, semantic search, or automated sorting rules for general file management. Its focus is on data versioning for large datasets, particularly for ML and data engineering, rather than intelligent file organization for individuals or small teams.