🌌 Clustering¶
📖 In the Clustering step, NANR organizes the large set of geometries collected during seam sampling into groups (clusters) that represent distinct regions of the intersection seam. Each cluster's centroid becomes a starting point for a MECI optimization.
Concept Overview¶
- Each geometry is aligned and compared using rotational and atomic label permutations to account for molecular symmetry.
- The algorithm applies a clustering method (typically k-means, though DBSCAN and others are supported) based on Euclidean distance between geometries.
- The elbow method determines the optimal number of clusters.
- The centroid of each cluster serves as a starting geometry for a MECI optimization.
- This step reduces redundancy and captures distinct structural regions along the sampled seam.
Step-By-Step Clustering¶
-
Create a clustering directory
mkdir <molecule>/seam_sampling/clustering -
Copy your seam trajectory
cp ../full_geom_seam.xyz . -
Create
clustering.yamlfilenames: xyz_file: full_geom_seam.xyz # starting xyz file directory_name: /<molecule>/seam_sampling/clustering/ num_atoms: 16 # total number of atoms cluster_settings: cluster_method: kmeans # can be: kmeans, dbscan, meanshift, affinity_propagation k: None # None = use elbow method ms_bandwidth: None # bandwidth parameter for Mean Shift ap_damping: 0.5 # damping for Affinity Propagation ap_pref_multipler: 1.0 # exemplar multiplier for Affinity Propagation dbscan_min: 5 # minimum samples for DBSCAN dbscan_eps: 0.5 # distance threshold for DBSCAN mass_weighted: False ignore_H: True # ignore hydrogens to speed up clustering diverse_sampling: False b: 1 # number of diverse pairs (kmeans/meanshift only) outfile: clustered.xyz # output file (optional) plot_settings: type: PCA # default PCA projection for visualization components: 2 -
Create
submit.sh#!/bin/bash #SBATCH -p l40-gpu #SBATCH -N 1 #SBATCH -n 4 #SBATCH -J cluster #SBATCH --mem=50G #SBATCH -t 10:00:00 #SBATCH --qos gpu_access #SBATCH --gres=gpu:1 module load tc/25.03 python3 /<path>/NonadiabaticNanoreactor/clustering/clusterer.py clustering.yaml -
Submit your clustering job
cd <molecule>/seam_sampling/clustering sbatch submit.sh
Outputs¶
clustered.xyz→ optional file containing combined cluster outputscentroid#.xyz→ representative centroid structureslog/→ job output logs and runtime information
Continue to ⏳ MECI Optimization →