The âMaintainers Minutesâ aims to give further insight into the workings of the nf-core maintainers team by providing brief summaries of the monthly team meetings.
Overview
In this double post, we discussed the following topics from the September and October maintainers meetings:
- Massive modules update
- Pipeline level test with nf-test
- [helpdesk is the new coffee-n-code[(helpdesk-is-the-new-coffee-n-code)
- New way of passing parameters to modules within nf-test scripts
- Stalled PRs due to waiting for review
- Replacing monolithic
conf/modules.conf
into per module-configs - Modification of modules specifications regarding channels
Massive modules update
The nf-core/modules update taskforce have been hard at work, and has đĽ surprise đĽ massively updated the modules:
The list is still pretty long, but we will get there.
Pipeline level test with nf-test
A new plugin for nf-test was created to facilitate pipeline level test, itâs nft-utils, and itâs already implemented in methylseq, rnaseq, sarek and some other pipelines. Maxime has made a bytesize about it.
helpdesk is the new coffee-n-code
Stay tuned, itâs coming pretty soon.
New way of passing parameters to modules within nf-test scripts
Our nf-core/modules have greatly benefited from the new nf-test implementation. However one pet peeve for many is the greatly increased number of config files that now comes with each module when installing on a pipeline (>100 file pipeline PRs anyone đą).
Fortunately
has made a proof of concept aimed at reducing the number of config files needed for testing different parameters. This approach consolidates configs that previously required separate files to test different parameters into a singlenextflow.config
, helping streamline testing processes.
An implementation can be seen in the bismark/align
module used in methylseq pipeline:
By moving parameters into when
blocks within the main nf-test script file, we can have a single nextflow.config
file with the ext.args
for the process set to the parameter defined in the when
block.
The proposal received unanimous approval. Mahesh will now begin to formalise this by updating the nf-core/modules specifications, and we can start to begin adding additional linting tests to help transfer to this system.
This system only applies to modules nf-test, and should not be used at pipeline level!
Stalled PRs Due to Pending Reviews
It has been noted that we occasionally receive almost-complete module PRs that stall because a reviewer leaves comments but does not return for a re-review. Even if the module author receives a second approving review, they may feel uncomfortable merging without the original contributorâs approval.
After a brief discussion, we agreed to introduce a new reviewing guideline: if all reasonable comments have been addressed and another reviewer has approved, the PR should be merged after 1 month if the original reviewer has not returned for a re-review.
If the original reviewer was temporarily unavailable during this period (which can happen!), the module can still be updated in a follow-up PR.
Replacing monolithic conf/modules.conf
with per-module configs
A contentious point was brought up by the developers of nf-core/rnaseq and nf-core/methylseq.
Both pipelines recently have experimented with modifying the ways in which modules and processes can get customised by config files.
Currently, the nf-core template has a single file - conf/modules.conf
- where all module-level configurations occur (specifying additional arguments, file publication specifications etc.).
The two pipelines, however, have opted to split this single monolithic file into multiple files, with one config file per module e.g. in this nf-core/methylseq PR.
The main goal of this approach is to make CI testing more efficient by allowing changes to a specific config to trigger only the related tests rather than the entire test suite (since all modules are currently dependent on a single config file). Another reason is to increase the modularity and portability of modules and their configurations.
This proposal initially brought up a lot of consternation with most of the other maintainers present. The primary issue was the potential impact on usability, specifically regarding the ease with which users can locate customized parameters within each module. Currently, we can direct users to a single file within a single directory, simplifying their search.
Splitting configurations into potentially hundreds of files could make it significantly harder for users to find what they need. Itâs also unclear if another file has an overriding setting for a module, for example thatâs defined in the subworkflow folder rather than the module folder. Additionally, this approach isnât currently supported by Nextflowâs native capabilities, which would allow modules to automatically detect and load nextflow.config files alongside main.nf. As a result, a substantial amount of manual work would be needed to manually include each moduleâs config.
A lengthy discussion ensued, weighing user readability against developer efficiency and which should take priority. For instance, efficiency improvements could be made at the level of nf-core/tools or GitHub CI configurations, where optimizations might achieve similar goals. Alternatively, waiting for native Nextflow support could eventually reduce the workload on developers.
The sarek developers proposed a middle ground that has worked well in sarek: triggering tests on smaller chunks while maintaining findability. Their approach involves naming config files after each process and storing all configs within a conf/modules/
folder. This setup makes configurations easy to locate and include while still achieving modularity. see sarekâs module configs organization here
A general agreement was that this new system is not yet palatable to enough for maintainers to propose pushing split configs to the template. However, nf-core/rnaseq
and nf-core/methylseq
can continue to refine this approach based on what nf-core/sarek
developers have proposed.
Developers agreed to revisit and review this approach again in a couple of months.
Modification of modules specifications regarding channels
Finally, we debated pros- and cons- of different ways of structuring input channels in modules.
In the initial design of DSL2 nf-core modules, it was decided to require one input channel per file type, and not support âmega-tuplesâ, due to readability and portability.
For example, this was found to be less readable:
Compared to:
Where ensuring each file was associated with itâs companion could be facilitated with (relatively) standardised multiMap()
calls.
multiMap
should only be used in this way when directly providing process inputs. Itâs not a general method of synchronising channel inputs.
By forcing a particular structure of a singular input tuple (former example), it restricted pipeline developers in how they can do their own file shuttling between processes, potentially requiring many custom map
functions to restructure channels.
On the flipside, there are some instances (think samtools
) where a particular file will always be accompanied by a secondary file such as index files.
By combining these into a single tuple would make it easier to chain such modules with extremely uniform input file requirements.
Recently the module specifications had been relaxed to allow such cases of the former, but also in particular that different file format variants serving the same function (.bai
vs .csi
files for example), could be represented in the same element of a âcollapsedâ input channel.
For example, an output channel could be like so:
Some maintainers expressed a preference for a stricter approach, advocating for each input file to be provided in its own input channel. Their concern was that shared channels could introduce risks, such as misconfiguration, where the wrong type of index file could inadvertently be supplied, causing the pipeline to fail if a downstream process receives an incompatible file type.
We discussed the potential benefits of âtypedâ inputsâpossibly a future feature in Nextflowâversus the importance of code clarity, which is challenged by the need to repeatedly use .join and .combine after each module call.
Differing philosophies emerged: some felt it essential to design pipelines that prevent all possible user errors, while others suggested pipelines should support only specific configurations and parameters, with any deviations (e.g., via ext.args) being at the userâs own risk.
This discussion remains unresolved, and will likely rear itâs head again in the future.
The end
As always, if you want to get involved and give your input, join the discussion on relevant PRs and slack threads!
- â¤ď¸ from your #maintainers team!