Hello Tello Pilot!
Join our DJI Tello community & remove this banner.
Sign up

SDK "streamon" format

Neoflash

Member
Joined
Dec 28, 2018
Messages
24
Reaction score
0
I would try to use the media streams. They will deal with the other part of the problem which is getting the frames displayed as quickly as possible. The browsers media player isnt designed for that. Unless you really are able to decode to canvas. Even then make sure the decoder supports delivering the frame at once or skipping.
Yeah, the best option would be MediaStream but unfortunately I haven't found a way in Node.js to create them from a custom source. WebRTC is really meant as a peer to peer (browser to browser) protocol and using it on the server is kind of a bastardization. At the moment, I haven't found a library that implements creating MediaStreams from custom sources on the server, only DataChannels. At least not in Node.js and I'm pretty set on using Node.js for this project, even if it's not the ideal tech. It is one of the constraints I have decided to put on myself for this project and I'm sticking with it.
 

Krag

Well-known member
Joined
Mar 22, 2018
Messages
251
Reaction score
189
I never actually tried WebRTC. I got stuck on the need for a server to setup the initial connection. And, like you, I had trouble finding a way to convert the video data to WebRTC streams.

At end of maybe a month of trying to do what you are doing I came to these conclusions.
- It was too slow to process the video data in the browser. Maybe I could use MSE if I knew codec and transports. But at then there is still the problem with the browsers playback delays.
- WebRTC might work if I could make a stream server side. But then I still need yet another server for the initial connection.
- I need a server to just connect the Tello to the browser.

So I need a server to transcode the data into a usable format any way I looked at it. So why make the browser do any of the work? Not simplicity. I think in the end the best result was with ffmpeg to transcode to a some media server. After a lot of tuning I got the latency to like ~3 second. But it would drift over time.
 

Neoflash

Member
Joined
Dec 28, 2018
Messages
24
Reaction score
0
I never actually tried WebRTC. I got stuck on the need for a server to setup the initial connection. And, like you, I had trouble finding a way to convert the video data to WebRTC streams.
Yeah, I'm going to have a fun time dealing with all of that. Some of the things that you got hung up on though aren't that big of a problem. Making the connection between the browser and the server and between the server and the Tello is actually the easy part of this project.

- It was too slow to process the video data in the browser. Maybe I could use MSE if I knew codec and transports. But at then there is still the problem with the browsers playback delays.
I think there are some relatively simple solutions to deal with playback delays and drifting. One that may be a little rough around the edge is to simply always play the "freshest" frame and just drop the other ones if for some reason the player has started drifting a bit. Another, a bit more complicated but not all that much, is to detect when drift is happening and speed up to catch up to the latest frame. Another thing you could do, and remember we are talking about implementing this in <canvas>, meaning that you would normally be in in charge of setting the frame rate, is to simply just display frames as they are decoded. I'm pretty sure this solution would give for some weird results but this would be the ultimate in the terms of showing the user the latest image coming in from the Tello. I just might have a go at implementing and testing all of these.

So I need a server to transcode the data into a usable format any way I looked at it. So why make the browser do any of the work? Not simplicity. I think in the end the best result was with ffmpeg to transcode to a some media server. After a lot of tuning I got the latency to like ~3 second. But it would drift over time.
I have everything happening in one single server. But the problem with decoding on the server instead of on the client is probably going to be the size of the decoded frames that have to be sent through the wire to the client browser. I might give it a go anyways, just to see, but I'm pretty sure that I'm going to have to find a solution that does the decoding on the client. OR, I just had a crazy idea, maybe I could have some sort of hybrid where, some frames are decoded on the client and others on the server. I don't know, I'll have to work all of this out.
 

chmiki

Member
Joined
Jan 5, 2019
Messages
9
Reaction score
2
Hi!

I played a bit with the video streaming - tested ffmpeg approach with broadway.js and canvas, jmuxer apporach with video tag and mplayer approach. mplayer seems to provide the best results in terms of stream delay, but at the cost of having video in separate window. jmuxer seems to have best results for in-app rendering, although the delay drifts over time - only fix i have found is to stop then start the video stream to get the delay back to 'normal' values.

I found that any server/ws component in the stack slows things down so i implemented an electron app as i can combine udp and browser things in the same process.

I've released the code here: tellometrik/tellometrik so you can check out tellometrik/tellometrik for jmuxer/ffmpeg/mplayer video implementation details.

Any contributions are more than welcome!

Cheers!
 

Neoflash

Member
Joined
Dec 28, 2018
Messages
24
Reaction score
0
I found that any server/ws component in the stack slows things down so i implemented an electron app as i can combine udp and browser things in the same process.

I've released the code here: tellometrik/tellometrik so you can check out tellometrik/tellometrik for jmuxer/ffmpeg/mplayer video implementation details.
Thanks alot @chmiki, I'll be sure to check it out right away. WebSockets would definitely slow things down. Despite what people think, it is not faster than http, it's built on top of the same transport protocol (TCP), for any real-time live video streaming application today where you want a second or less of latency, you pretty much have to go with UDP. That is why I'm using WebRTC DataChannels. They are technically SCTP running over UDP but you can essentially configure them to remove most of the reliability and flow control stuff leaving you with essentially UDP.
 

Neoflash

Member
Joined
Dec 28, 2018
Messages
24
Reaction score
0
@chmiki @Krag @biometrics @hellowill89 Out of frustration, I've decided to dig deeper into the h.264 format in order to really understand what the heck it is we are actually getting in those UDP packets from the Tello. Have any of you been able to figure it out? So far, I've understood that h.264 splits the video into 3 different types of frames. "I" frames are real full frames than contains all the information about the image, I think they are also called key frames. Then there are "P" frames that basically only contain the delta between the last image and the current image. And then there are "B" frames that are bidirectional, meaning that they contain information about the last image and the next image. A sequence of "I", "P" and "B" frames is sometimes called a GOP (group of pictures). The complicated thing - well, one of them - is that not all h.264 videos use the same kind of sequence. There are different profiles: "baseline", "main", "high" and others. For instance, "baseline" profile does not contain any "B" frames at all. And even within a same profile, videos can vary in the amount of each type of frames they'll have in each GOP. And all of this is just a primer. I haven't yet delved into exactly how data is stored into each type of frame and I'm going to bet there are going to be a lot of variances there too. h.264 seems to be a very flexible codec - which is a good thing, kind of - but it makes it a challenge to understand.

So one of the first things I'm trying to determine is exactly what profile was used to encode the video sent from the Tello. I'm using JavaScript on the server and the client and it looks like all available h.264 decoder JavaScript libraries only work with the "baseline" profile, which logically should be used for something like a drone, wanting to keep latency as low as possible, but I'm not sure it is the case. Do any of you know how to analyze the UDP packets and make that determination?

I've used WireShark to determine that the video is streamed in sequences of 12 UDP packets. The first one in the sequence always seems to start with 00 00 00 01 (I could be wrong I haven't looked at a large sample) and the first 11 packets always contain 1460 bytes. The twelfth is always of a smaller and variable size. Where do I go from here to get more information about the content of the packets? One of the questions I'm wondering is if each group of 12 UDP packets represents one single frame, either of "I", "P" and "B" type or if it represents a GOP (a set of frames).
 

hellowill89

Active member
Joined
Dec 30, 2018
Messages
33
Reaction score
5
While in my case this doesn't matter, but when I was looking at packet headers, I did see that there is some pattern to them. like you said some start with something like 00 00 00 015, and others with 00 00 00 013. That's what I recall.
 

Krag

Well-known member
Joined
Mar 22, 2018
Messages
251
Reaction score
189
TelloLib can save out just the raw .h264 stream with all the NAL stuff removed. Running one of those files through ffprobe says that it is encoded with the Main Profile.
Code:
Input #0, h264, from '2018-13-5--19-21-54.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Main), yuv420p(progressive), 960x720, 25 fps, 25 tbr, 1200k tbn, 50 tbc
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=Main
codec_type=video
codec_time_base=1/50
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
width=960
height=720
coded_width=960
coded_height=720
has_b_frames=0
sample_aspect_ratio=0:1
display_aspect_ratio=0:1
pix_fmt=yuv420p
level=40
color_range=N/A
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=left
field_order=progressive
timecode=N/A
refs=1
is_avc=false
 

xristos

New member
Joined
Oct 5, 2018
Messages
1
Reaction score
0
hi @hellowill89
first,thanks about your really useful information about how to receive tello video.
I have build on Android Studio an app(JAVA) to control tello and now i want to to display the video.
You referred that we must 1.receive bytes/frames 2. decode each frame 3.convert to Bitmap and display.
Is possible to share here a section of your code?

Is there other easier and faster way to display(and if is possible,save) the video?
Something like " VideoView and Uri uri = Uri.parse(udp://0.0.0.0:11111) "
or a ready API which is easy to use for live streaming?
 

Neoflash

Member
Joined
Dec 28, 2018
Messages
24
Reaction score
0
TelloLib can save out just the raw .h264 stream with all the NAL stuff removed. Running one of those files through ffprobe says that it is encoded with the Main Profile.
Code:
Input #0, h264, from '2018-13-5--19-21-54.h264':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: h264 (Main), yuv420p(progressive), 960x720, 25 fps, 25 tbr, 1200k tbn, 50 tbc
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=Main
codec_type=video
codec_time_base=1/50
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
width=960
height=720
coded_width=960
coded_height=720
has_b_frames=0
sample_aspect_ratio=0:1
display_aspect_ratio=0:1
pix_fmt=yuv420p
level=40
color_range=N/A
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=left
field_order=progressive
timecode=N/A
refs=1
is_avc=false
Thanks for that information. Could you explain what the "NAL stuff" is?
 

Krag

Well-known member
Joined
Mar 22, 2018
Messages
251
Reaction score
189
NAL = Network Abstraction Layer. The 00 00 00 01 headers.
 

hellowill89

Active member
Joined
Dec 30, 2018
Messages
33
Reaction score
5
@xristos If you only want to preview the video, you don't need to convert the image to a Bitmap. You can use a `SurfaceView` and pass it's surface to the `MediaCodec` directly so that it decodes into the `SurfaceView`. That's the best way for video preview. In my case I was doing video analysis, not just previewing it.
 

New Threads

Members online

Forum statistics

Threads
2,599
Messages
20,308
Members
6,191
Latest member
pebdiver