Separate Cumulative and Running Average for RDF/F(Q) in 0.7

With the new averaging feature being introduced in 0.7.0, the speed at which the simulations run at is dramatically reduced due to increased number of partial g®’s written and read to the restart file every 10 iterations as determined by the number of iterations of G® and F(Q) one averages over.

To overcome this issue, would it be possible to separate the averaging into a cumulative average and a running average? For example, if I run 10000 steps with averaging set to 1000 in its current form, at the end I will only see the averaged F(Q)/RDF based upon steps 9001-10000, rendering the averaged quantities in steps 1-9000 unnecessary. Since the rate limiting step is reading/writing the restart with the large number of partial g®’s saved, the effect this has will be minimised in steps 1-9000 and lead to a large decrease in run time.

The idea of a cumulative average and separate running average would allow say a 10 step running average of the RDF and F(Q) to be calculated between steps 1-9000 (fast read and write of restart) to check the simulation is along the right track, followed by a large cumulative average in steps 9001-10000 (slow read and write of restart).

Based upon my own use, a large cumulative average allows a more accurate and consistent F(Q)/RDF to be produced by Dissolve, and so by implementing an idea like this would (hopefully) lead to a faster simulation.

Yes, having the option to accumulate a set of partials, whether they are g(r) or (unweighted / weighted) F(Q), would be sensible.

There are a couple of ways of implementing this which immediately spring to mind. Given that there are (presently) four modules that calculate partial sets (RDF, SQ, NeutronSQ, and XRaySQ) and with the last two generating two sets of partials (the weighted g(r) and S(Q)), we could:

1a) Add an Accumulate option for each possible set of partials in the relevant modules, and display the accumulated data on top of the original (averaged) data.
1b) Do as 1a) but display the accumulated data on separate graphs.
2) Add a new module that targets a specific set of partials, and calculates the accumulated average completely separately from the parent module.

1a and 1b are probably a little more user-friendly, but result in a lot of duplicated code. 2) is slightly less user friendly, but the accumulation code and data display only needs to be written once. The latter also makes more sense from the layer pipelines present in Dissolve - all the accumulation modules can be in a separate analysis layer, activated as necessary.

Thoughts?

I’m leaning towards the idea of having a separate module like you have suggested in 2). The idea being that you would turn on the accumulate module to average over the relevant partials after you have refined the structure using the EPSR module. That way the restart file is kept small and the NeutronSQ modules etc aren’t reading/writing potentially thousands of partials when called resulting in a faster simulation time in general. A possible issue will be increasing the number of iterations as the separate ‘Accumulate’ module will have to be turned on and run after already having run for many iterations to refine the structure etc - less user friendly as you say.

1b) I feel separate graphs won’t be needed - accumulated would replace the averaged one

1a) will be easier for users as they can just specify the ‘Accumulate’ option, not worry about it and come back with everything done. From what I’ve seen with the averaging feature in 0.7.0 so far, reading and writing the partials seems to be one of the slowest processes so far, and can bump up the size of the restart file enormously to the point where it can’t be opened by the gui version to view the relevant graphs easily.

I feel the reading and writing the partials needs to be outside of the restart file, and so perhaps writing them to an external file throughout the refinement process, and then using the ‘Accumulate’ module to go back and average X partials in the file containing the partials is a sensible approach? Possible issue could be the size this partial file becomes, but not sure.

So in summary I feel 2) is a better idea and won’t make it that less user friendly in my opinion.

Yes - I think option 2 seems the most sensible. It keeps the accumulated averages separate, and so could even be more user friendly. So if it’s easier code-wise, go with that.