Transcoding and file size in svn-1463

This topic has 3 replies, 2 voices, and was last updated 17 years, 6 months ago by rpedde.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
02/01/2007 at 11:52 AM #947
xaviour
Participant
I noticed that Firefly sends the file size of the source file instead of the size of the resulting wav file.

For example, for the following track:
```
ogginfo 01 - Meant to Be.ogg

Processing file "01 - Meant to Be.ogg"...



New logical stream (#1, serial: 12a67e78): type vorbis

Vorbis headers parsed for stream 1, information follows...

Version: 0

Vendor: Xiph.Org libVorbis I 20040629 (1.1.0 rc1)

Channels: 2

Rate: 44100



Nominal bitrate: 224.000000 kb/s

Upper bitrate not set

Lower bitrate not set

User comments section follows...

ALBUM=Woods of Chaos

ARTIST=Rob Costlow

COMMENT=http://www.jamendo.com/ : Free music

DATE=2005

DESCRIPTION=http://www.jamendo.com/ : Free music

GENRE=Jazz

LICENSE=2005 Rob Costlow - Solo Piano. Licensed to the public under http://creativecommons.org/licenses/by-nc-nd/2.5/ verify at http://www.jamendo.com/?&888

ORGANIZATION=http://www.jamendo.com/ : Free music

TITLE=Meant to Be

TRACKNUMBER=1

WWW=http://www.jamendo.com/?&888

Vorbis stream 1:

Total data length: 6811355 bytes

Playback length: 4m:28.160s

Average bitrate: 203.202715 kb/s

Logical stream 1 ended
```
Firefly reports to my Soundbridge that the file is 6.5MB (ogg total data length / (1024*1024)) when it should be 45.2MB (Playback length * Sample Rate * Channels * Sample Size + size of wav header).
This real file size could be used for the calculation of “real_len” in function “pi_stream” in case of a server side conversion.
03/01/2007 at 4:45 AM #8172

rpedde
Participant

@xaviour wrote:

Firefly reports to my Soundbridge that the file is 6.5MB (ogg total data length / (1024*1024)) when it should be 45.2MB (Playback length * Sample Rate * Channels * Sample Size + size of wav header).
This real file size could be used for the calculation of “real_len” in function “pi_stream” in case of a server side conversion.

Except I don’t do that much introspection into the file at scan time, since I don’t know how the transcoder will convert it. Might convert it to 44100/16/2, or maybe leave it at 48k. Dunno, that’s up to the transcoder to write the write wav header.

So I don’t really know the size of the file when I send it.

I do send wav headers that are pretty close (as close as I can estimate based on song duration, since I don’t know sample count either) when I stream the wav, and the http headers don’t include a response length.

That seems to work okay for all the players I’ve seen, although it does have the issue of showing wrong metadata.

I’ve got fields in the database right now for sample count and stuff, and I hope to get all that in the metadata scanners, so eventually that might be more correct, but it’s pretty much a long-term issue right now, as it seems to mostly work. :-/

But yeah, I’d like that to work as well. Getting there. Flac has all that data from the metadata scanner, fwiw.

— Ron

03/01/2007 at 12:56 PM #8173

xaviour
Participant

Ron,

I have been doing some digging in the svn database and on Internet and I have 2 remarks.

First, I think you are making your life much too complicated regarding the transcoder. An uncompressed stream (PCM) can be fully characterized by 4 parameters: the sample size, the sample rate, the number of samples (with song duration = number of sample / sample rate) and the number of channels. Let’s ignore the pedantic case of surround sound for which it makes theoritical sense to have a different sample rate for the different channels (bass).

The number of channels is 1 or 2. There is no point in transforming a mono signal into a stereo signal (you cannot create information). Transforming a stereo signal into a mono signal is possible but masochistic. You wrote somewhere that you want to keep Firefly simple so I guess you do not want to support this case.

The number of samples is not a problem for transcoding. The transcoding process will not change the song duration.

The sample rate can have an important number of values. In practice, it is CD quality or 44100 Hz. There is no point in increasing the sample rate during the transcoding (you cannot create information). Reducing the sample rate is also masochistic and as a consequence is also not a feature.

The sample size is the most problematic parameter from a transcoding point of view. There are 2 reasons for that. With lossy codecs, it is usually not possible to know what the original sample size was. Moreover, you do not know the quality of the DAC on your rendering device (20 bits on a M1000). However the loss due to the compression is such that it does not make sense to transcode to 24 bits. Also transcoding to 8 bits audio is masochistic. To conclude, when transcoding lossy codecs the resulting sample size should be 16 bits when transcoding lossless codecs the resulting sample size should be either whatever is indicated in the file or the maximum sample size in PCM format supported by the renderer (16 bits for a M1000).

So quite a big rant to say that the number of type of wav files created by transcoding is quite limited.

The second remark concerns the “scan-*.c” files: wow! I am must say that I am impressed by the amount of work you did. Getting it right and maintaining it must be a major pain. Would you consider rewriting or having someone rewrite most of the “scan-*.c” to use taglib? After all, the library is already used for Musepack. I understand this is not a priority for you so I can offer my help if you wish.

As far as I can see, it should be possible to use taglib for scan-aac, scan-flac, scan-mp3, scan-mp4, scan-mpc and scan-ogg. Moreover if the developer of taglib import some code from Amarok, it could also be possible to use it for scan-aif and scan-wav.

The net result would be a single hard dependency on taglib instead of id3tag (hard), vorbis (soft), taglib (soft) and flac (soft).

Let me know what you think about it.

04/01/2007 at 4:13 AM #8174

rpedde
Participant

@xaviour wrote:

So quite a big rant to say that the number of type of wav files created by transcoding is quite limited.

Should be, but doesn’t seem to be in practice. The windows transcoder (using the wmp sdk) seems to transcode on a “whatever format I feel like” basis. I think it’s possible to specify what you want to transcode to, but I’m not completely sure.

Likewise, I don’t really know what comes out of ffmpeg — I just set the wav headers based on what it says it is. In any event, I’m not sure you can say with confidence what comes out of the transcoding process until you apply a transcoder to it.

Maybe I’m wrong, but that’s how it seems.

The second remark concerns the “scan-*.c” files: wow! I am must say that I am impressed by the amount of work you did. Getting it right and maintaining it must be a major pain. Would you consider rewriting or having someone rewrite most of the “scan-*.c” to use taglib? After all, the library is already used for Musepack. I understand this is not a priority for you so I can offer my help if you wish.

Yes, in principle I would be. It’s something I’ve already given some thought to, actually. My only real problem with taglib is its c++ nature. Lots of embedded machines (mipsel using the broadcom 3.0 toolchain, probably others) don’t have a g++, which would make it an impossible target on those platforms.

The C api is a wrapper around a c++ core, right?

On the other hand, it would simplify a *lot*, and eventually allow for tag write-back. Which would be nice. I’d *love* to be able to offload that code to someone elses library.

As far as I can see, it should be possible to use taglib for scan-aac, scan-flac, scan-mp3, scan-mp4, scan-mpc and scan-ogg. Moreover if the developer of taglib import some code from Amarok, it could also be possible to use it for scan-aif and scan-wav.

Aif and wav are trivial and insubstantial anyway. The metaparsers should be split from the file metadata anyway, and I don’t think that taglib does file metadata (sample size, etc), or does it?

In any event, I’d be quite willing to entertain the idea. I guess the biggest drawback is the c++ nature. Is it in fact c++? And then, I wonder how big a problem that is, or if its a problem I’m making up?
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

The forum ‘Nightlies Feedback’ is closed to new topics and replies.