FireFly Media Server › Firefly Media Server Forums › Firefly Media Server › Setup Issues › Foreign characters and XML entries
- This topic has 10 replies, 5 voices, and was last updated 18 years, 4 months ago by rpedde.
-
AuthorPosts
-
17/05/2006 at 4:03 PM #274zemanelGuest
hi,
I’ve just installed the latest build and the problems with mt-daapd crashing after every rescan seem to be gone (although I’ve only been using it for a few hours).Another good thing: it can find the artist info, ablum, etc. for almost all wav files, whose info is contained in the XML file.
So the ability to process XML files is back.One problem remains tough: “special” international characters.
Here’s one example of a fairly popular artist: “Björk“. The XML entry for her song “One day” as produced by iTunes running on Win XP is:
Track ID 2932
Name One Day
Artist Björk
Composer Björk Gudmundsdottir
Album Debut
Genre Alternative & Punk
Kind WAV audio file
Size 56800844
Total Time 322000
Disc Number 1
Disc Count 1
Track Number 7
Track Count 12
Year 1993
Date Modified 2006-04-30T19:19:01Z
Date Added 2006-04-30T19:18:57Z
Bit Rate 1411
Sample Rate 44100
Rating 60
Persistent ID 171A57BBFDACC25F
Track Type File
Location file://localhost/F:/Music/Bj%C3%B6rk/Debut/07%20One%20Day.wav
File Folder Count 4
Library Folder Count 1
three fields seem to be the culprit(s): Artist, Composer and especially Location, which I suspect is the origin of the problem. Notice the way it writes the path:
…/Bj%C3%B6rk/Debut/07%20One%20Day.wav
when it’s actually …/Björk/Debut/07 One Day.wavbtw, mt-daapd does find the file “07 One Day.wav” but it cannot associate any XML data with it.
Now a different example: “Clã” a Portuguese band with the special Character “ã“. Their song “A grande pirâmide” appears in the XML file as
Track ID 240
Name A Grande Pirâmide
Artist Clã
Album Kazoo
Genre Latin
Kind WAV audio file
Size 37836668
Total Time 214493
Disc Number 1
Disc Count 1
Track Number 1
Track Count 13
Year 1997
Date Modified 2006-05-03T01:55:52Z
Date Added 2006-04-30T00:35:26Z
Bit Rate 1411
Sample Rate 44100
Rating 60
Persistent ID 7AA7DF1713F05A40
Track Type File
Location file://localhost/F:/Music/Cl%C3%A3/Kazoo/01%20A%20Grande%20Pir%C3%A2mide.wav
File Folder Count 4
Library Folder Count 1
The actual location of the file is …/Clã/Kazoo/01 A Grande Pirâmide.wav
and mt-daapd shows it as “01 A Grande Pir?mide.wav“So, what should I do. Should I go to the XML file and edit it manually by replacing things like “Cl%C3%A3” with “Clã“? Or is there some option to process such data in the latest build?
The first option of doing it manually doesn’t bother me, since I can script it. All I need to know is how to present the info in a way which mt-daapd can read it.
cheers (and kudos for addressing the scan crashing and XML processing issues so promptly)
18/05/2006 at 12:07 AM #4435rpeddeParticipant@zemanel wrote:
hi,
So, what should I do. Should I go to the XML file and edit it manually by replacing things like “Cl%C3%A3” with “Clã“? Or is there some option to process such data in the latest build?We talked about this before right? This was a ntfs drive that got moved to a unix box and is now accessed via samba? Is that right?
Until everyone stores everything in utf-8 straight-through, this kind of thing is just plain going to be a nightmare.
The files are stored as utf-16 on disk, but iTunes is obviously storing the file names as utf-8. So up-promote utf-8 to utf-16? This might take some work.
I think I have some bjork around, I’ll see if I can’t replicate it. I might not be able to — at least not on windows. I’m pretty sure I was playing with some Bjork music when I was working through playlist stuff, and it worked okay locally, so it’s probably a conversion issue on samba.
19/05/2006 at 12:26 AM #4436zemanelGuest@rpedde wrote:
We talked about this before right? This was a ntfs drive that got moved to a unix box and is now accessed via samba? Is that right?
Yes, we discussed this in another thread which disappeared after the recent change to the forum structure/pages. It’s a ntfs that got moved to a NSLU2 unlsung with the latest 6.8 and support for Western Europe/Latin 1(850).
@rpedde wrote:
The files are stored as utf-16 on disk, but iTunes is obviously storing the file names as utf-8. So up-promote utf-8 to utf-16? This might take some work.
Well, I tried to change one entry in the XML file where iTunes had Bj%C3%B6rk (C3 B6) is the UTF-8 code for the letter ö, opened the file with an xml editor and changed %C3%B6 to 00F6 – the code for the same character in UTF-16, but mt-daapd still couldn’t find it.
Also, when I Telnet into the slug I can access the folder named Björk by just typing cd Björk. So the Slug certainly understands ö.
@rpedde wrote:
I think I have some bjork around, I’ll see if I can’t replicate it. I might not be able to — at least not on windows. I’m pretty sure I was playing with some Bjork music when I was working through playlist stuff, and it worked okay locally, so it’s probably a conversion issue on samba.
If you can tell me how to edit the xml file so that mt-daapd can process the path info for something like …/Björk/…, which iTunes writes as ../Bj%C3%B6rk/…, I can do it for the other cases myself, even if that means having to change the xml entries manually.
Cheers
19/05/2006 at 3:10 AM #4437rpeddeParticipant@zemanel wrote:
If you can tell me how to edit the xml file so that mt-daapd can process the path info for something like …/Björk/…, which iTunes writes as ../Bj%C3%B6rk/…, I can do it for the other cases myself, even if that means having to change the xml entries manually.
… and if I knew that, I’d do it in code. 🙂
But maybe it’s in codepage. In 850, an o with a diaeresis is 0x94, looks like. Try that.
[/i]
19/05/2006 at 7:33 AM #4438fizzeParticipantWow, iTunes is really weird there.
I too run a codepage 850 NTFS drive on my slug, but I use lots of apps for playlists and the likes.
In fact, I havent enabled the process_m3u’s option because I want to wait until this is more stable.
Go zemanel, and spot all those bugs 🙂
19/05/2006 at 7:35 AM #4439schiersParticipantHi,
we can be lucky that Björk Guðmundsdóttir uses only her first name, don’t we? 8)
BR,
Carsten.19/05/2006 at 11:25 AM #4440zemanelGuest@schiers wrote:
Hi,
we can be lucky that Björk Guðmundsdóttir uses only her first name, don’t we? 8)
BR,
Carsten.Good point mate 😀
But if you think the set of characters in her last name is bad, check out her early jazz album “Gling-Gló” and take a look at the name of the tracks… 😯
19/05/2006 at 12:38 PM #4441fizzeParticipanthehe, funky 😀
The stuff with the weirdest names I got is probably from either Bugge Wesseltoft or the Essbjörn svennsno trio….. those crazy nordics 😉
19/05/2006 at 4:11 PM #4442zemanelGuestOk, as Ron suggested I tried to replace Bj%C3%B6rk with Bj%94rk in the file names and this time it worked.
So since I couldn’t find a conversion table online (even though there must be one somewhere), I went to Mathematica and wrote a small conversion tool from UTF8 to Code Page CP850 for Western European languages, which consists of 128 additional “special” characters. Here is the result for the ones which are likely to appear in a file name.
The table format is as follows: A %BB%CC %DD
“A” is the special character
“%BB%CC” is the way iTunes writes it in the XML file
“%DD” is what it should be in the XML file instead of “%BB%CC”UTF-8 CP850
Ç %C3%87 %80ü %C3%BC %81
é %C3%A9 %82
â %C3%A2 %83
ä %C3%A4 %84
à %C3%A0 %85
å %C3%A5 %86
ç %C3%A7 %87
ê %C3%AA %88
ë %C3%AB %89
è %C3%A8 %8A
ï %C3%AF %8B
î %C3%AE %8C
ì %C3%AC %8D
Ä %C3%84 %8E
Å %C3%85 %8F
É %C3%89 %90
æ %C3%A6 %91
Æ %C3%86 %92
ô %C3%B4 %93
ö %C3%B6 %94
ò %C3%B2 %95
û %C3%BB %96
ù %C3%B9 %97
ÿ %C3%BF %98
Ö %C3%96 %99
Ü %C3%9C %9A
ø %C3%B8 %9B
£ %C2%A3 %9C
Ø %C3%98 %9D
× %C3%97 %9E
ƒ %C6%92 %9F
á %C3%A1 %A0
í %C3%AD %A1
ó %C3%B3 %A2
ú %C3%BA %A3
ñ %C3%B1 %A4
Ñ %C3%91 %A5
ª %C2%AA %A6
º %C2%BA %A7
¿ %C2%BF %A8
® %C2%AE %A9
¬ %C2%AC %AA
½ %C2%BD %AB
¼ %C2%BC %AC
¡ %C2%A1 %AD
« %C2%AB %AE
» %C2%BB %AF
Á %C3%81 %B5
 %C3%82 %B6
À %C3%80 %B7
© %C2%A9 %B8
¢ %C2%A2 %BD
¥ %C2%A5 %BE
ã %C3%A3 %C6
à %C3%83 %C7
¤ %C2%A4 %CF
ð %C3%B0 %D0
Ð %C3%90 %D1
Ê %C3%8A %D2
Ë %C3%8B %D3
È %C3%88 %D4
Í %C3%8D %D6
Î %C3%8E %D7
Ï %C3%8F %D8
¦ %C2%A6 %DD
Ì %C3%8C %DE
Ó %C3%93 %E0
ß %C3%9F %E1
Ô %C3%94 %E2
Ò %C3%92 %E3
õ %C3%B5 %E4
Õ %C3%95 %E5
µ %C2%B5 %E6
þ %C3%BE %E7
Þ %C3%9E %E8
Ú %C3%9A %E9
Û %C3%9B %EA
Ù %C3%99 %EB
ý %C3%BD %EC
Ý %C3%9D %ED
¯ %C2%AF %EE
´ %C2%B4 %EF
%C2%AD %F0
± %C2%B1 %F1
¾ %C2%BE %F3
¶ %C2%B6 %F4
§ %C2%A7 %F5
÷ %C3%B7 %F6
¸ %C2%B8 %F7
° %C2%B0 %F8
¨ %C2%A8 %F9
· %C2%B7 %FA
¹ %C2%B9 %FB
³ %C2%B3 %FC
² %C2%B2 %FD
Ron, if you want I can send you the table in a txt file formatted in a way which is easier for you to implement the transfomation rules in mt-daapd.
Perhaps there could be an option in the mt-daapd.conf file where people with ntfs drives attached to a slug could turn the option value to 1 thereby telling mt-daapd how to interpret/translate the UTF-8 paths in the XML correctly?
19/05/2006 at 6:23 PM #4443AnonymousInactiveI’ll try to add some to the confusion 🙂
Isn’t the charset conversions handled by iconv in gnu c? Is there a platform agnostic command for charset conversion?
On my samba I have specified dos charset CP850 and unix charset iso-8859-1 So dos clients (don’t know if that goes for just dos or everything from windows) writes the filename as CP 850, Samba converts and stores it as iso-8859-1 on the server./a
-
AuthorPosts
- The forum ‘Setup Issues’ is closed to new topics and replies.