×

INDI Library v2.0.7 is Released (01 Apr 2024)

Bi-monthly release with minor bug fixes and improvements

Ekos scheduler aborting job because start time was passed

  • Posts: 152
  • Thank you received: 20
This is from a fresh build from the git repository pulled on 11/2 running on Mint Xenial.

I had two jobs scheduled to run last night - one priority 5, one priority 10. Both jobs were scheduled to run ASAP. The first job ran fine. The second job was aborted automatically with this error:

2018-11-04T01:29:06 Job 'prj_Mel15a' startup time was 04/11/18 01:19, and is already passed by 00h 09m 46s, marking aborted.

A mid-job refocus or two can easily cause a job to finish later than expected. I would think the next job up shouldn't be aborted for that reason.

Anything that I can change to keep things running when one job gets delayed?
The following user(s) said Thank You: Jasem Mutlaq, Alfred
5 years 5 months ago #31227

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301
Thanks for reporting this. Do you have a log of your session?
Normally an aborted job should be rescheduled later on. But there is probably a blatant bug, or a regression, when the aborted job is the last one.
I assume you planned your two jobs as "asap" jobs, correct?

-Eric
5 years 5 months ago #31234

Please Log in or Create an account to join the conversation.

  • Posts: 152
  • Thank you received: 20
Yes, both jobs were planned ASAP - basically the defaults all around. The only thing I adjusted was the priority to force one to happen before the other.

Unfortunately I did not have full logging enabled as I had just reset everything to defaults (deleted all the config files/directories) to solve a freeze up problem. It didn't occur to me to re-enable logging after doing that. It's on now.

This is what I got in the Ekos window.

2018-11-04T01:29:09 Shutdown complete.
2018-11-04T01:29:09 Ekos stopped.
2018-11-04T01:29:08 INDI devices disconnected.
2018-11-04T01:29:06 No jobs left in the scheduler queue.
2018-11-04T01:29:06 Job 'prj_Mel15a' startup time was 04/11/18 01:19, and is already passed by 00h 09m 46s, marking aborted.
2018-11-04T01:29:06 Astronomical twilight: dusk at 18:54:00, dawn at 05:15:00, and current time is 01:28:56
2018-11-04T01:29:05 Job 'prj_Mel15a' is complete.
5 years 5 months ago #31235

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301
Thanks. Last job aborted leads to this bug apparently, as "no jobs left in the scheduler queue" indicates.
This happens because the duration of preliminary steps are currently hard-coded, and probably too short for some setups.
One workaround would be to add a third job that is planned with a fixed time very late in your night. Ugly workaround indeed, but that would cause the aborted job to be rescheduled.
I'm currently rewriting the planning algorithm, I'll make sure this situation is avoided.

-Eric
5 years 5 months ago #31242

Please Log in or Create an account to join the conversation.

  • Posts: 152
  • Thank you received: 20
Got it. I'll give that a shot.
5 years 5 months ago #31263

Please Log in or Create an account to join the conversation.

  • Posts: 152
  • Thank you received: 20
I added a third job to capture a single sub of M42. I forced it to be the last job by setting the priority. After a few mishaps, I started the job and let it go. I came back to find it had started the first sub, then went to bed. One of the mishaps was odd - after guiding started, the capture module hung up on "changing filter" for about 15 minutes. I confirmed the filter wheel was operating and was on the correct filter. All seemed in order there but Ekos just stopped for no reason I could see.

I woke this morning, powered things down from my phone (different interface for the power controls) and went about getting ready for work. A checked the folder where the subs were to go and found only 1 sub, not the 6 1h subs I expected. What really got me is that that sub had a timestamp equal to the time when I powered everything down. Also, I have an external (to Ekos) automation that does last-resort parking of the mount. It runs at sunrise. Ekos did not notice the mount had parked.

Baffled, I logged in and looked at Ekos which was still running. I see the jobs aborted, etc in the status windows. I checked the log file and can see that from about half way through the first sub to about the time the first sub wrapped Ekos stopped logging, uh, *everything*. It then has a small handful of log entries at around the time the sub completed, and then a gap of almost 8 hours where it woke up again. The end of the gap coincides with when I powered things down.

Not really sure what might cause this.

Log file attached. Ekos goes silent around 00:04 or so.

UPDATE - On launching Ekos and starting the indiserver both Ekos and Kstars hang completely. This is identical to the behavior I saw and documented in this thread:

indilib.org/forum/ekos/3885-kstars-ekos-...t-variant.html#31258

Really not sure what to do about this other than wipe all of the configurations out and start over again.
The following user(s) said Thank You: Jasem Mutlaq
Last edit: 5 years 5 months ago by Greg.
5 years 5 months ago #31316
Attachments:

Please Log in or Create an account to join the conversation.

Is there anyway to reproduce this (or the hanging) using simulators? What happens before it hangs? is it repeatable?

This is a show-stopper and must be resolved before 3.0.0 is out.
5 years 5 months ago #31320

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301
From the log, this is the recurring issue we have with captures that do not finish, for whatever reason. The capture started at 23:36 never completed. But alignment succeeded before that...
I can address this with a timeout in the Scheduler, but it doesn't solve the root cause. If the capture fails to finish, the communication channel with the camera is damaged.
I could consider it is a loss of connection with indiserver, but it's not clear whether that would resolve the problem once it happens.

-Eric
5 years 5 months ago #31321

Please Log in or Create an account to join the conversation.

So we can add capture timeout like other modules to somewhat mitigate this issue.

EDIT: Added to GIT.
The following user(s) said Thank You: Eric
Last edit: 5 years 5 months ago by Jasem Mutlaq.
5 years 5 months ago #31323

Please Log in or Create an account to join the conversation.

  • Posts: 1029
  • Thank you received: 301
Obviously the change should be made in the capture module first. Do we agree?

-Eric
5 years 5 months ago #31324

Please Log in or Create an account to join the conversation.

  • Posts: 152
  • Thank you received: 20
I was able to reproduce the hang with the sims previously. I found that deleting ~/.config/kstarsrc seemed to get things going again. Thinking ahead, I simply renamed it last time so I still have a copy. I'll give it a test and see how reproducible it is with the sims. It was 100% reproducible with the actual hardware.
5 years 5 months ago #31341

Please Log in or Create an account to join the conversation.

  • Posts: 152
  • Thank you received: 20
I reproduced the kstars/ekos freeze with the simulators yesterday. To do this I copied the "bad" kstarsrc file back into place, started kstars, and launced Ekos w/ the default simulators. After using Ekos for a few minutes the entire application would freeze up similarly to how it does with the normal set of equipment running. It doesn't seem to happen as quickly though, not sure why.

Reverting back to a "new" kstarsrc I testing the addition of a "throw away" job last night, I ran into a similar situation that was at the start of this thread. I managed to get the window logs but not the debug logs as the log setting must be stored in kstarsrc and logging was off. The first job completed successfully, the second job started and slewed, focused, aligned, and appears to have started guiding when it seems like the software went to sleep or something. This is the log entry for the scheduler:
2018-11-08T04:48:06 Job 'prj_Mel15a' is aborted.
2018-11-08T04:48:06 Job 'prj_Mel15a' is now approaching astronomical twilight rise limit at Thu Nov 8 04:48:00 2018 (30 minutes safety margin), marking aborted.
2018-11-08T00:51:53 Job 'prj_Mel15a' capture is in progress...
2018-11-08T00:51:53 Job 'prj_Mel15a' guiding is in progress.

This is what I saw in the imaging module. Seems to have hung waiting for the filter to move. Checking the wheel, it was on SII:
2018-11-08T00:51:53 Changing filter to SII...
2018-11-08T00:51:53 Job requires 3,600.000-second SII images, has 0/3 frames captured and will be processed.
2018-11-08T00:51:53 Ekos will refocus in 4304 seconds, last procedure was 196 seconds ago.
2018-11-08T00:51:53 CCD capture aborted
2018-11-08T00:51:53 Autoguiding stopped. Aborting...
2018-11-08T00:51:51 Changing filter to SII...
2018-11-08T00:51:51 Job requires 3,600.000-second SII images, has 0/3 frames captured and will be processed.
2018-11-08T00:51:51 Ekos will refocus in 4306 seconds, last procedure was 194 seconds ago.
2018-11-08T00:48:37 Ekos will refocus in 4500 seconds.

I've attached the full Ekos window log and kstarsrc that freezes.
Last edit: 5 years 5 months ago by Greg.
5 years 5 months ago #31384
Attachments:

Please Log in or Create an account to join the conversation.

Time to create page: 0.272 seconds