I am trying to run Dissolve on PSC’s Bridges-2 HPC resource, but I have run into a segfault error when running on Bridges’ compute nodes.
I have compiled both the serial and parallel versions of Dissolve using the instructions provided in the user-guide, and I didn’t receive any errors. The test I am running is for Benzene from the provided example data. When I run the program in an interactive session on Bridges-2’s regular memory-shared nodes I get an segmentation fault on the 5th iteration when RDF runs.
Edit: Comparing the test run’s output on Bridges-2 to the Dissolve-GUI, I see that the segmentation fault happens between these two logs:
Finished calculation of intramolecular partials (0.01 seconds elapsed, 0.00 seconds comms).
Finished summation and normalisation of partial g(r) data (0.00 seconds elapsed, 0.00 seconds comms).
If you’d like further information, please ask. Thank you.
Hi - and sorry for the slow reply!
Can I first check which version you are compiling here - is it the release (0.7.5) or development (0.8.0) code? I have noticed a segfault in a very similar place in a feature branch we are currently working on, and we’ll be looking in to this today.
I have found what I believe to be the cause of your crash - I assume that you are compiling the 0.8.0 development version, as the bug is not present in the 0.7.5 release. It was also fixed in the development branch seven days ago, so if you were to update to the latest source the problem will hopefully go away!
I should also say that the 0.8.0 is working but in the middle of significant ongoing changes - if you choose to carry on with it, your continued feedback would of course be appreciated!
Well, assumptions are dangerous things! Another user reported that they were seeing the same behaviour on our local HPC machine with the 0.7.5 version. Whilst looking in to the issue I recompiled it and the problem magically “went away”. This isn’t particularly helpful information, I know, but perhaps cleaning the build dir and recompiling may be worth a shot. If that doesn’t fix it, we need to get in to what compiler / MPI combo you have. For our HPC it is FOSS(gcc)/OpenMPI.
We are using the 0.7.5 release from the GitHub yeah. I’ll clean the place up and see if that helps and let you know how it goes.
Correction! We were using 0.7.4 on Bridges-2. I am currently compiling 0.7.5 with a fresh install.
Awesome. Let me know how it goes.
Okay. So far so good! I suppose the issue may be in 0.7.4 then… But it seems to work in 0.7.5! I’ll add any updates if I run into similar problems.