New Version is Out!

Release Notes for Melissa v2.0.0

Code Quality & Refactoring

  • Added docstrings to all major files in melissa/.
  • Added a .nix/ folder containing files to spawn a nix devshell for quick Melissa testing.
  • Refactored multiple components across api/, melissa/, and examples/:
    • Hid implementation details in user-inherited server classes for better usability.
    • Renamed variables for clarity and consistency.
    • Privatized internal attributes.
    • Added and improved type hinting across the codebase.
    • Enhanced code quality using pylint.
    • Rewrote configuration files to use clearer and more consistent keys.

Configuration & Setup

  • Removed hard-coded port numbers for client-server TCP connections:
    • Allows running multiple servers on the same node without port conflicts.
    • Introduced a retry strategy: if a port fails, server attempts the next one, up to a defined number of retries.
  • Refactored the ZeroMQ CMake-based installation for cleaner integration.
  • Dependencies can now be installed directly via pyproject.toml using pip:
pip install .           # base dependencies
pip install .[dev]      # development dependencies
pip install .[dl]       # deep learning dependencies
pip install .[all]      # all of the above

Note: Manual installation via requirements_*.txt is still supported for environments.

Performance & Usability Enhancements

  • Added runtime feedback for users:
    • Logs average simulation duration every 10,000 samples.
    • Reports current memory usage.
    • Displays buffer memory usage (as sample_size_in_bytes * buffer_size in GB).
  • Clients scripts are now generated on-the-fly at job submission time:
    • The server submits job_limit jobs and keeps submitting new ones as others complete.
    • Reduces unnecessary script creation and supports reactive resampling.

Error Handling

  • Created specific exception classes in melissa.server.exceptions for improved error reporting and traceability.
  • Improved handling of failed or unexpected conditions across modules.

New Features

  • New: melissa.server.OfflineServer
    • Enables dataset creation from configuration studies without active server involvement during simulation.
    • Offline server principally samples parameters and submits client scripts.
    • Ideal for validation and offline training dataset generation.
    • Clients handle their own data saving. melissa_send() is not used.
  • Added melissa.utility.rank_helper:
    • Provides helper functions and decorators for MPI operations.
    • Introduces ClusterEnvironment dataclass to retrieve and manage SLURM/OMPI environment variables.

CI/CD

  • Migrated to a dedicated machine on ci.inria.fr for running Melissa CI pipelines.
  • Created a SLURM-compatible Docker cluster for local and CI testing.
  • Added a manual version bumping stage in CI pipeline.
  • Added consistency checks to compare results against the previous version using L2-norm metrics.
  • Moved study run logic to tests/ci with modular bash scripts for maintainability.

Parameter Sampling

  • Replaced Python’s random module with NumPy’s np.random for consistent random number generation.
  • Implemented static sampling using np.memmap for memory-efficient parameter access across MPI ranks.
  • Introduced MixIn classes to enable reusable behaviors in custom parameter samplers.
  • Extended BaseExperiment to support parameter manipulations in breed-style samplers.

Deep Learning

  • Refactored base classes by moving redundant logic to BaseServer.
  • Users no longer need to manage buffer or dataset creation (instantiate objects manually) —this is handled based on configuration.
  • Improved exception handling throughout DL modules.
  • Separated dataset creation, dataloader, and tensorboard logging into dedicated modules.
  • Improved imports to avoid loading unnecessary modules.
  • Introduced a generic dataloader for iterable datasets in non-Torch/TensorFlow workflows.
  • Added a framework-agnostic training loop with user-defined hook support, located in melissa.server.deep_learning.train_workflow.
  • Introduced a new round-robin communication strategy:
    • Ensures all trajectory data from a single simulation stays on the same buffer.
    • Especially useful for DL workflows that require temporal coherence in training samples.
  • Support for Parallel validation. But, users must take care of reducing statistics per server rank.

Sensitivity Analysis

  • Refactored sensitivity analysis base classes.
  • Fixed issues with Pearson arrays in IterativeSensitivityMartinez.
  • Corrected internal use of melissa_sobol.increment to resolve result inconsistencies.
  • Disabled group data aggregation on rank 0 by hardcoding sobol = 0 in the client API code—each client now sends data directly to the server (as in a non-sobol study).
  • Introduced server-side Sobol caching:
    • Results are now cached using keys: (client_rank, time_step, field).
    • Only computes Sobol statistics when all required data is received for a group.
    • This strategy minimizes client-side communication and leverages the typically higher memory availability on server processes, making caching more effective.
  • Fixed data gathering issues:
    • Replaced P2P calls with MPI_Gatherv.
    • Aggregates results field-by-field for improved accuracy and scalability.