×

INDI Library v2.0.6 is Released (02 Feb 2024)

Bi-monthly release with minor bug fixes and improvements

Perfecting the Scheduler

  • Posts: 1119
  • Thank you received: 182
Many thanks to Jasem and Eric! So far in my testing the scheduler is now running throughout the night without quitting. That's fantastic for someone like me who lives in the White Zone of a large city and has to collect as many photons as possible to cut through the light pollution and get half-way presentable images of distant galaxies.
I started the scheduler again last night and it ran through the night until it parked the mount at twilight. There were only three things I noticed that should be relatively easy to improve on (says someone who is not doing the actual coding...):

1) I had started the scheduler aiming for my target for last night, M63, and it went to sleep, properly waiting for M63 to get high enough in the sky as I had specified in the schedule. So far so good. So I thought I might use the time to get some more light on NGC2359 before it disappears for the summer and I manually selected that target, plate solved and positioned it and manually started autoguiding (so this was not preprogrammed in the scheduler). The camera was happily clicking away until it was interrupted by the scheduler waking up and slewing to M63. Also so far so good. However, at this point the scheduler, the focus module and the autoguide module got into a mudfight. As the focuser was trying to get the stars into focus, the capture module was still running and the guider module was still convinced that it was tracking NGC2359 and kept looking for its lost guide star. As a result, the mount began to drift uncontrollably. As I was still awake at that time, I stopped the madness manually, parked the mount, and restarted the schedule. All went fine from there on until...

2) ... gusty winds came up and started knocking my telescope around impairing the guiding. I had set a guide deviation limit and that worked fine, so numerous frames were restarted when the deviation became too great. This works VERY WELL! However, at least once the wind gusts were so strong that the autoguider lost its guide star. As it was hunting for a new guide star, the mount drifted and M63 ended up no longer in the center.

What I am wondering now is whether it is possible to make two small changes:

1) Instruct the scheduler to automatically stop image capturing and guiding, if this is going on at the time the scheduler starts.

2) Instruct the scheduler to automatically start a realignment procedure if the guide star has been lost in the autoguider module and then restart the autoguiding procedure (including recalibration) after realignment is complete. A complication I foresee is that if this realignment overlaps with a scheduled meridian flip, the latter may not be carried out, since that is initiated by the capture module, which will be paused at the time. If capture then restarts after the mount has passed the specified HA, it may not receive the flip instructions. Not sure if that is an issue, but please bear in mind.

And one third thing I noticed: As seeing conditions can change through the night, the HFR values might also increase. So, when I click the box in the capture module to refocus once the HFR value exceeds a certain point, set at the beginning of that imaging run, if seeing deteriorates or if the HFR value increases for any other reason, the focusing module will constantly be called up by the capture module and refocus. It will eventually arrive at a new minimum and reinitiate capture, but if the next value is then again higher than the originally set HFR, it will constantly refocus for the rest of the night. This can be remedied by refreshing the HFR limit in the capture module to reflect the value at which the focuser settled the last time it ran, instead of the initial value when the capture was started, increased by the set tolerance value.

I imagine none of these changes will require extensive coding, but they would go a long way towards making this excellent scheduler perfect.

Thanks again for all the work that goes into this and for sharing with all of us.

Jo
The following user(s) said Thank You: Jasem Mutlaq, Eric
5 years 11 months ago #25119

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301

Replied by Eric on topic Re:Perfecting the Scheduler

First disclaimer : my contribution to kstars has been very very (very) limited until now. I have 30+ commits waiting in line to be pushed, but I'm so slow at accepting my own changes......

1) I confirm the current state of the setup is not entirely under control when the scheduler starts. I have a few changes for this in the pipe, but to be perfectly stable this would require strict compliance with a state diagram. This is one of the reasons I joined the thread on "resuming a schedule" and decided to contribute. As an example, similar to yours, it happens that the scheduler will decide the mount is parked while it's actually not, and attempt to repeatedly unpark it without understanding it's already ok. This can be a very bad surprise in the morning. Working around this implies making sure (asserting in the code) that the set of properties of a scheduler job has a finite number of combinations.

2) I also observed that issue, and decided on a solution in which any transitory issue on a job would mark it aborted, and that as a consequence, aborted jobs would be automatically rescheduled at the next evaluation. The scheduler is not supposed to drop an activity if its calculations indicate it is achievable, even if that means repeatedly trying and failing. If there are passing clouds, or a tree (yes that's also transitory), or wind gusts, that doesn't make the work impossible. However it's not possible to just do that in the current state of the code because the evaluation is not stable yet (in the sense of always resulting in the same configuration for the same input vector). I'm working towards that.

3) Yes, HFR located in the sequence job is a bit of a problem. Somehow, it sets the level of quality you expect from the capture you are attempting. But because the value is located in the sequence job, you are under the impression that it's generic to a capture. Actually that level will only make sense if it is always the same star that is taken as input, because HFR will change when star changes and seeing is imperfect. Therefore, to the precondition the focus star selection algorithm is stable, the target field of view determines the HFR level to achieve, and the right location for the HFR level is the scheduler job. Obviously, this change has a few usability issues in the current interface. The other problem is that it requires a bit of experience to enter a HFR limit manually.

-Eric
The following user(s) said Thank You: Jasem Mutlaq, Wolfgang Reissenberger, Jose Corazon
5 years 11 months ago #25146

Please Log in or Create an account to join the conversation.

Thanks for the detailed report. I'm aware of the HFR issue. One way to solve this is to store filter-specific (if there is filter) HFR values of the first successfully completed autofocus operation. In sequence file, if you want to rely on the autofocus output, the HFR value there is set it zero. If you want it to always respect your file value, then set it in the sequence. The downside to this as you eloquently described is that autofocus would be endlessly called after each capture even when better focus is not achievable now due to seeing or whatever reason.

Another approach is keep recording the HFR values reported after each autofocus operation per filter is completed successfully, and then perform some statistical analysis to pick the most reasonable value for the autofocus operation to compared against. The easiest (and probably most reliable) would be the median value, but better approaches are possible. This might be something I work on in 2.9.5

For the other scheduler issues, I'm waiting for Eric to submit his changes.
The following user(s) said Thank You: Eric, Jose Corazon
5 years 11 months ago #25157

Please Log in or Create an account to join the conversation.

  • Posts: 1119
  • Thank you received: 182
Thank you both for your replies and for explaining in such great detail. As I wrote, my understanding of the coding complexity of this impressive software suite is rudimentary at best. My comment that this might involve fairly simple changes was based on the assumption that it might only require the introduction of one line in the code like 'if guide star lost then abort guiding and go back to alignment step'. I humbly acknowledge my naivete here...:blush:

So just to make sure, my understanding is that it is not feasible without major software changes to dynamically update the HFR limit value shown in the capture module. From my observations it seems like the stars that were selected for focusing when using full field integration remain unchanged as the HFR values of subsequent image files are being calculated and compared to the reference frame. There is usually a high degree of consistency of HFR values when the capture module compares the integrated HFR values of the most recent image with the HFR value "on file". From what I have experienced, when I select a tolerance value of 15% I experience a refocusing event only every 20 frames or so, i.e. every 30-40 min. If it were possible to update the HFR value always with the most recently measured one and then allow for a tolerance of, say 10-20% in either direction, that would ensure that the focus module only springs into action when there is a fairly dramatic shift in seeing conditions. In addition, the timed refocusing will periodically ensure that the minimum is maintained. I don't think this will require complex statistics, those will be provided in real time by the repeated measurements and if the variance is getting too great, i.e. if there is an outlier that exceeds a predefined amount then refocusing takes place. As long as these parameters can be adjusted to personal preference or inactivated that would probably serve most users.

As for reprogramming the scheduler: By all means, Eric, take your time!!! It is much more important to have this well thought through than crashing the mount. I just thought I might bring this to your attention, in case you didn't already know about it. If it helps in any way, you can watch what is currently happening in real time here:

www.dropbox.com/s/176x9dzzb43oush/Indise...152018small.mp4?dl=0

I recreated the problem and recorded it on my screen as it was unfolding. In this case, the guide module recovered fairly quickly, but you can see how that may have taken quite a bit longer or have failed, as it did the night before. Nonetheless, guiding had serious problems (not swapping coordinates despite clearing calibration?).

I am quite happy to be your tester and submit my observations from time to time, as long as you think they are useful and don't regard them as mere griping.

I truly appreciate what you are doing here and would be happy to help in any way I can!

Best,

Jo

PS: By the way, if you look at the video at ~17 min in, you will see that the deviation from the minimum is measured incorrectly. The % difference value is calculated as [HFR(now)-HFR(min)]x100, not as %Diff=[HFR(now)/HFR(min)x100]-100.
The following user(s) said Thank You: Eric
Last edit: 5 years 11 months ago by Jose Corazon. Reason: Corrected formula
5 years 11 months ago #25177

Please Log in or Create an account to join the conversation.

I implemented median-HFR in KStars Nightly, if you have access to that PPA, please give it a go.
The following user(s) said Thank You: Eric, Jose Corazon
5 years 11 months ago #25219

Please Log in or Create an account to join the conversation.

  • Posts: 1119
  • Thank you received: 182
Cloning my system to a new SD card now so I can implement the nightly build as a fresh install. Not sure it will work out tonight, though, it is cloudy anyway.
5 years 11 months ago #25221

Please Log in or Create an account to join the conversation.

  • Posts: 1119
  • Thank you received: 182
I installed the nightly build



and all went fine including focusing and guiding until the scheduler began to capture. That resulted in repeated crashes of the camera.
Log attached.
A caveat is that I ran this on a Pi3, not my mini-PC, which has 4x the amount of RAM.
Thoughts?


Sorry I couldn't make this work. I'll give it another try, but may have to wait a while since I will be travelling on business. I am also going to install the system on an external USB stick so I can run it from my mini-PC, not the Pi3. It is possible that the Pi ran out of resources, but I doubt it, since it solved OK, so it could acquire images with my main imaging camera (albeit binned 2x2 for plate solving) and it did save fine previously. I also put it into limited resource mode. When it tries to save the first image in the sequence, it hangs and ultimately crashes.
Let me know if you think the problem is on my side or if the changes you made in that build could possibly cause this behavior.

Jo
Last edit: 5 years 11 months ago by Jose Corazon.
5 years 11 months ago #25248
Attachments:

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301

Sorry from the log I don't see what the problem could be. However, because I've been hit by that issue yesterday, check that it's not the same CCD that is used for guiding and capturing.

Going through my change list, I'm about to push a commit fixing the issue with repeated job iterations when the scheduler does remember job progress. The test I made this night was involving a sequence on M51, 10x40", repeated 300 times. With the change, the scheduler properly planned for 18 hours of imaging (though estimation of job duration is still completely off), mostly properly counted captures and mostly recovered when aborted/reset or paused/restarted :)

I also added an interesting change, at least for me right now. Repeated jobs that are not requested to guide will realign the target each time a batch is complete. This allows unguided sessions to both keep the target centered and process a meridian flip when the batch finishes.

However I'm still checking possible regressions...

First, I had issues with my remote solver providing its result properly, but Ekos simply ignoring that and keeping waiting for something. Switching to the online solver did not trigger the symptom, so I used that for this night. My indi remote is still 1.6 though.

Second, as I mentioned earlier, I had the issue where the CCDs of the remote indi server are enumerated in a different order as in a previous run. In my case, this defaulted all CCDs in Ekos to the MGen guider, which doesn't provide usable frames. In my case this is a no-go for the focuser so I could spot the issue, but I wonder what would happen if the CCD was providing a real frame.

Also, because my indiserver got disconnected 20min after starting, I registered a new change to my todo list to use a retry algorithm in that situation, now that in my branch, aborted jobs are verified and eventually rescheduled if not complete.

Sorry if that sounds confusing, I'm progressing very slowly, but very cautiously...

@knro now that kstars requires indi 1.7 to build, should there be a warning when connecting to a 1.6 server? Is that a hard incompatibility or a compile-time requirement?

-Eric
The following user(s) said Thank You: Jasem Mutlaq, Jose Corazon
5 years 11 months ago #25306

Please Log in or Create an account to join the conversation.

  • Posts: 1119
  • Thank you received: 182
"Sorry from the log I don't see what the problem could be. However, because I've been hit by that issue yesterday, check that it's not the same CCD that is used for guiding and capturing."

That's not the problem. Also, that would have manifested itself earlier, since with the guide cam guiding the capture module would have reported that an exposure was already in progress. However, in my case it would count down the entire 120" of the exposure and then just hang trying to display the image in the FITS viewer and never progressing to the next exposure from there.

I still need to check whether the Pi3 simply got overloaded and ran out of memory. I did not see it utilize the Swap space, so there could have been a problem with that. Unfortunately, this will now have to wait for 3 weeks as I am going out of town today. But many thanks for the improvements you have made on the scheduler already. It now runs flawlessly for an entire night on multiple targets with tracking, focusing, solving, guiding and flipping.
5 years 11 months ago #25309

Please Log in or Create an account to join the conversation.


cgit.kde.org/kstars.git/commit/?id=256c0...6081d13406df42822f17

Sorry I'm working today on many issues so I'll rejoin this discussion later, but wanted to point out that the remote solver issue might be resolved in the commit above.
The following user(s) said Thank You: Eric
5 years 11 months ago #25313

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301

Replied by Eric on topic Re:Perfecting the Scheduler

@knro fix verified, thanks a lot!

-Eric
5 years 11 months ago #25333

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301

Replied by Eric on topic Re:Perfecting the Scheduler

OK, well, the very last fixes are giving me a hard time: special cases at every corner! Plus the phabricator stuff getting a bit on the way (thanks for your patience @knro).

I don't know when 2.9.5 is scheduled, but I should push the stable part of the last changes tonight probably. If only my build times weren't that long, I don't understand how cmake manages dependencies sometimes... What will remain next is the correctness and stability of resetting scheduler jobs: there are still discrepancies between aborted jobs, jobs reinitialized from the evaluate button, jobs just created from the UI and jobs loaded from an scheduler queue. But I'm getting there progressively.

If the current state seems stable enough, and it is indeed, as long as long as you load and start the scheduler and don't re-edit a job that already ran, I can postpone this activity.

-Eric
The following user(s) said Thank You: Wolfgang Reissenberger
5 years 11 months ago #25388

Please Log in or Create an account to join the conversation.

Time to create page: 0.276 seconds