I try to setup different configuration with the hardware I have available (Bluepill / Arduino-Mega / Fysetc S6 / Max PCB2)
Not thaty I beleive the errors come from different platforms, it is just because this is what I have to set-up different scenarii.
(With / Without Focuser, Rotator, Equatorial vs Altaz ...)
I could not do all since I have still no soldering iron, but was able to set-up an Arduino Mega 2560 with flying wires.
just to see if my thoughts are correct:
1) -If I connect to Serial (via usb-serial dongle) dedicated to WiFi I should be able to connect with Indi or terminal and send commands / receive responses, correct?
2) - If I use an Arduino Mega 2560 I connect my ESP8266 Rx/Tx to serial 1, correct?
If I try (1) I still have sometimes errors that I don't have via standard USB
If I try (2) I have some other errors: (with kstars of with python script)
a) I cannot connect at all
b) I can connect but after a while all is blocked
c) I have the message "Serial Interface to OnStep is Down!" in the Web browser
So after looking at this a lot more, and thinking about some things. The problem mainly is the timeout, but even setting it longer will crash things sometimes for a few reasons. (Yay wireless, also *#$%* ISPs.)
After that happens, often the connection does get the response. Unfortunately, tcflush doesn't work for that, nor do a few other things. So what was happening was say I had :A?# (returns ABC (A= max stars, B=Current Star, C=#stars in alignment, or B/C is flipped,) followed by :GU# (WHich is the nPada/250 looking string that's status info) and then :GR# So you'd get a timeout on A?, so :GU# would get back 600#, and :GR# would get nPada/250, and if input verification and on down the line. (Occasionally something would help it.) I didn't have the input verification needed for every function, so when it hit those it could crash. When I did add the input verification, I still had a problem: Everything was shifted. So while it wouldn't crash, you'd get endless error messages.
Unfortunately after playing with tcflush and others which didn't work, Eventually, I just decided to read it with a very short timeout (1ms (lower it? since it should be in the buffer already?)) until the buffer was cleared. Which seems kinda inelegant, but works. I also introduced getCommandDoubleResponse (no corresponding function in lx200drivers), which is same calling as the SingleCharErrorLongResponse, with a double added. This has the clear out portion, and after testing for a while (and quite large logfiles!) I can't reproduce the continuous shift or crashes. In the past I could after some period of time, be it 5 minutes, or 2 hours, it'd eventually have the issue.) It doesn't clean up the code much at all over doing it in the LongResponse, but does setup nicer for checksummed commands/resend. (I think.)
I will add the buffer clearing to others, and I do wonder about adding it to the lx200driver calls, because they also operate over the network. I don't have another lx200 to test, but I suspect if someone hooked up an esp8266 in it's original 'dumb' serial port <-> wifi intention, that might be seen.
Occasional value is incorrect (due to not all functions flushing it, This should mostly be strings, like :GU#, or similar, I will be adding flushes to reduce/eliminate this)
INFO/ERROR messages only when something makes it past that buffer flush, so there should be only 1 for any given message.
Lots of error messages on disconnection (need to work on this.)
This is on the network-timeouts branch so far. Please test and let me know if there are any crashes or problems on wifi (or serial).
I usually believe I am the reason when something goes wrong, but with our problem there is anyhow something strange.
I am also testing intensively the comm and found at least that on OnStep side something is wrong.
There is definitively a different behavior between USB and WiFi and it is not due to the Network but to the handling of the bridge between serial (OnStep Side and Network) .
Sending the same sequences over USB and then over WiFi I have two different results:
- USB does not crash nor timeouts or hangs
- WiFi hangs for all Rotator commands
In still do not understand why but in my opinion there should be no difference.
I connected directly to Serial in place of the Wemos ( bypassing Wemos) and do not observe errors.
So the errors must be due to the way the WEmos handles Serial to WiFi (I observed also hangs when challenging the Web and in parallel communicating over port 9998.
There are also hangs over port 9999 for the android App.
So I try to see what's going on the Wemos side. But here I firs must understand how it is handled ;-(
Thanks for digging into this. Yes, I suspected that delayed results going to the subsequent commands would crash (e.g. expecting a number on Cmd1, which times out, then Cmd3 gets it, and things go haywire.
So yes, buffer clearing is a good idea, but how long would one wait? SWS has a default of 200 ms timeout, plus the time a command takes on OnStep. It varies. Maybe 250 ms minimum timeout is safe? I don't know.
Yes, if you bypass the Wemos you will get different results. The thing is, the Wemos runs new software (SWS) which has some internal processing, and is a bit different from the addon. So what you are seeing is probably buried in SWS somewhere.
Regarding the rotator, are you testing with the conditional GX98 code? If not, then yes, the :r commands will take a long time and cause problems.
I don't have much to add here, so will wait until the experts finish their analysis.
In the meantime, should Alain push the small GX98 patch to Jasem, to prevent having others report the same issue?
No I don't want to try the GX98 command knowing it works.
What I say is since the ":rx#" commands work without any trouble over usb, they should work in the same manner over WiFi.
There is no reason it should be different.
If it is something is wrong also on the WiFi side.
If I understand well people running over Ascom have similar issues.
So may be we not only have to look what we do wrong on Indi side but also what's going on in the SWS.
I think there are two separate things here:
1. Investigate what is different in SWS vs. USB (and even the older WiFi addon)
2. Proper device detection (rotator, focuser, and in the future thermometers, and other features), and not polling devices that are not there.
Task 1 is a technical development task to get to the bottom of why SWS delays responses. Howard gave some clue in the thread I linked to earlier.
Task 2 needs to eventually go into Jasem's repo, because it is the efficient way of doing things. Only 10% or 20% of OnStep users have a rotator, so why make INDI "too chatty" for something that is not there. The focusers are used more, but most users who have them has only one. So why poll the second one? And so on.
I am okay with delaying task 2 until you get to what the underlying issue for task 1. But task 2 must go in for efficiency's sake.
At least, now after flashing Wemos again but with board library version 2.4.2 I can say it does not work better than before with version 3.02. (for the r: commands) but cannot reproduce crashes so far.
Will leave running this night and see
But I suppose I should keep this 2.4.2 version since it is what is advised by Howard.