Skip to content

🎯 Binary Classification

Evolve ML pipelines that maximize AUC on binary classification tasks.


Prerequisites

  1. VEOX Server: Start the local VEOX server (requires Docker):

    docker run -d \
      --name veox-enclave-server \
      -p 8090:8090 \
      714044927654.dkr.ecr.us-east-2.amazonaws.com/doug/single_enclave/veox-enclave-server:latest
    
    See the Quick Start for detailed server setup, health checks, and Docker Compose instructions.

  2. Python SDK: Install the veox package via PyPI:

    pip install veox
    


What It Does

The binary family genetically evolves multi-stage ML pipelines:

Scaler β†’ Expander₁ β†’ Expanderβ‚‚ β†’ Reducer β†’ Model₁ β†’ Modelβ‚‚ β†’ Fusion

Each stage is drawn from a library of interchangeable components. The engine evaluates candidates using K-fold cross-validation and scores them on AUC (Area Under ROC Curve).

Quick Start

from sklearn.datasets import make_classification
import pandas as pd
from veox import VeoxEvolver

# Generate a challenging binary dataset
X, y = make_classification(n_samples=500, n_features=20, n_informative=10,
                            n_redundant=4, n_clusters_per_class=2, random_state=42)
df = pd.DataFrame(X, columns=[f"f{i}" for i in range(20)])
df["target"] = y

# Evolve!
evolver = VeoxEvolver("binary")
evolver.fit(data=df, target_column="target", max_generations=5)
print(f"Best AUC: {evolver.best_fitness_:.4f}")

Full Example

from sklearn.datasets import make_classification
import pandas as pd
from veox import VeoxEvolver

# 1. Generate challenging dataset
X, y = make_classification(
    n_samples=1000,       # 1000 samples
    n_features=20,        # 20 features
    n_informative=12,     # 12 informative
    n_redundant=4,        # 4 redundant
    n_clusters_per_class=3,
    flip_y=0.05,          # 5% label noise
    random_state=42,
)
df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(20)])
df["target"] = y

print(f"Dataset: {df.shape[0]} rows Γ— {df.shape[1]} cols")
print(f"Class balance: {y.mean():.1%} positive")

# 2. Connect & verify
evolver = VeoxEvolver("binary", api_url="http://127.0.0.1:8090")
evolver.health_check()  # βœ… Server is healthy

# 3. Evolve
evolver.fit(
    data=df,                    # Your DataFrame
    target_column="target",     # Target column name
    max_generations=10,         # 10 generations
    population_size=50,         # 50 candidates per island
    timeout_per_eval=30,        # 30s per candidate
    max_poll_time=600,          # 10 min wall-clock limit
)

# 4. Inspect
print(f"Best AUC:    {evolver.best_fitness_:.4f}")
print(f"Pipeline:    {evolver.best_pipeline_}")
print(f"Evaluations: {evolver.result_.total_evals}")
print(f"Throughput:  {evolver.result_.evals_per_minute:.1f} evals/min")

# 5. Save
evolver.save("binary_results.json")

from sklearn.datasets import make_classification
import pandas as pd
from veox import VeoxEvolver

# 1. Generate challenging dataset
X, y = make_classification(
    n_samples=1000,       # 1000 samples
    n_features=20,        # 20 features
    n_informative=12,     # 12 informative
    n_redundant=4,        # 4 redundant
    n_clusters_per_class=3,
    flip_y=0.05,          # 5% label noise
    random_state=42,
)
df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(20)])
df["target"] = y

print(f"Dataset: {df.shape[0]} rows Γ— {df.shape[1]} cols")
print(f"Class balance: {y.mean():.1%} positive")

# 2. Connect & verify
evolver = VeoxEvolver("binary", api_url="http://127.0.0.1:8090")
evolver.health_check()  # βœ… Server is healthy

# 3. Evolve
evolver.fit(
    data=df,                    # Your DataFrame
    target_column="target",     # Target column name
    max_generations=10,         # 10 generations
    population_size=50,         # 50 candidates per island
    num_islands=4,              # πŸ’Ž PRO FEATURE: 4 parallel islands
    timeout_per_eval=30,        # 30s per candidate
    max_poll_time=600,          # 10 min wall-clock limit
)

# 4. Inspect
print(f"Best AUC:    {evolver.best_fitness_:.4f}")
print(f"Pipeline:    {evolver.best_pipeline_}")
print(f"Evaluations: {evolver.result_.total_evals}")
print(f"Throughput:  {evolver.result_.evals_per_minute:.1f} evals/min")

# 5. Save
evolver.save("binary_results.json")

Binary Classification Dashboard
Live dashboard β€” AUC fitness scatter, champion trend, and pipeline leaderboard.

Pipeline Stages

Slot Description Examples
scaler Feature normalization StandardScaler, MinMaxScaler, RobustScaler
expander_1 Feature engineering PolynomialExpander, GPFeatureExpander
expander_2 Additional features SpectralExpander, WaveletExpander
reducer Dimensionality reduction PCA, SelectKBest, UMAP
model_1 Primary classifier XGBoost, CatBoost, LightGBM, TabPFN
model_2 Secondary classifier Any from above pool
model_fusion Ensemble combiner StackingFusion, VotingFusion

Fitness Configuration

Parameter Value
Primary Metric AUC (Area Under ROC Curve)
K-Fold Splits 3
Direction Maximize
Exception Penalty βˆ’2.0

Open-Source Datasets to Try

# scikit-learn
from sklearn.datasets import make_classification, load_breast_cancer
from veox import VeoxEvolver

evolver = VeoxEvolver("binary")

# Breast cancer (real-world)
data = load_breast_cancer(as_frame=True)
df = data.frame
df = df.rename(columns={"target": "target"})  # Already named "target"
evolver.fit(data=df, target_column="target", max_generations=5)

# Highly imbalanced
X, y = make_classification(n_samples=2000, weights=[0.95, 0.05], random_state=42)

πŸ’Ž VEOX Pro Activation

To unlock VIP Evaluators and Pro Algorithms (like PaperKit and Generative routines), you must authenticate your local node with a VEOX License Token.

from veox import VeoxEvolver

evolver = VeoxEvolver("binary", api_url="http://127.0.0.1:8090")

# 1. Fetch your unique Hardware Fingerprint
fingerprint = evolver.get_system_fingerprint()
print(f"My VEOX Node Fingerprint: {fingerprint}")
# Example Output: My VEOX Node Fingerprint: 476ad03474b31e3c84d07df9088d93f0

# 2. Provide this fingerprint to your VEOX Admin to receive a JWT Token
jwt_token = "eyJ0b2tlbiI6ICJVRExK...EXPIRES"  # Replace with your token

# 3. Activate the Enclave
if evolver.activate_license(jwt_token):
    print("VIP Features Unlocked!")
    # evolver.fit(...) will now utilize full Pro capabilities

Multiple Datasets

Train on several datasets simultaneously to evolve more robust, generalizable pipelines. Fitness is averaged across all datasets β€” high-variance candidates are penalized automatically.

from sklearn.datasets import make_classification
import pandas as pd
from veox import VeoxEvolver

# Generate two independent classification datasets
def make_df(seed, noise=0.05):
    X, y = make_classification(
        n_samples=800, n_features=20, n_informative=12,
        flip_y=noise, random_state=seed,
    )
    df = pd.DataFrame(X, columns=[f"f{i}" for i in range(20)])
    df["target"] = y
    return df

df1 = make_df(seed=42, noise=0.03)  # Low noise
df2 = make_df(seed=99, noise=0.10)  # High noise

evolver = VeoxEvolver("binary")

# Pass a list of DataFrames β€” engine evaluates on ALL of them
evolver.fit(
    data=[df1, df2],              # ← list of DataFrames
    target_column="target",
    max_generations=5,
)
print(f"Best AUC (averaged over 2 datasets): {evolver.best_fitness_:.4f}")
File paths & pathlib.Path

You can also pass CSV file paths (or pathlib.Path objects) instead of DataFrames β€” this avoids loading everything into memory:

from pathlib import Path

evolver.fit(
    data=[Path("train_a.csv"), Path("train_b.csv")],
    target_column="target",
    max_generations=5,
)

Tips

  • Start small: Use population_size=20, max_generations=3 for quick tests.
  • GPU models: TabPFN and FMLTabPFN use GPU when available.
  • Custom fitness: Inject your own metric with custom_fitness_code=.
  • Callbacks: Use on_champion= to log new discoveries to MLflow, W&B, etc.
  • Multiple datasets: Pass a list: data=[df1, df2, df3] β€” see section above.