[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TV] Source code, GPL and licenses...



OK. so I pulled and scanned the code... (tv-v2.00-20040126.tar.gz)

What I'm not seeing is the grabber source.... any suggestions?
Looking at dgc2xml.pl, I'm guessing that you have URLs for Digiguide data for some sites.

I've also gone back through the archives and seen the 'mirroring' discussion. I'd add some thoughts in this area (bear in mind I'm thinking primarily PVR here - although all the Peers could act as bleb www and xml mirrors too)

Listing data changes after publication and before it's 'live'.
I think there are 3 reasonable stages:
14 days away : Fuzzy data, quite a few unknown slots. Useful advance notice of some things.
7 days away : Pretty solid. Ready to be paper published.
1 days away : Last minute schedule changes.
I call this the +1/+7/+14 approach.

Terminology
========
Source : master site for Diguide/XML/HTML listings data.
Network : An ordered list of Peers offering the bleb.org service
Master : www.bleb.org
Peer : www.another.site in the Network
Grab : A grab from the Source site
Scheduled Grab : Each channel has an schedule for obtaining data from the Source Site. Client : application (not browser) using bleb data (eg: xmltv's tv_grab_uk_bleb)
Use : a Client grab or use of the service.

Objective
======
Service is available if any Network member fails.
Reasonable Client load balancing
Minimal load on Sources
Occasional

High level Peer behaviour
=================
A Peer regularly [on the hour + 2mins * position in the Network list] connects to the top entry in the Network list to synchronise listings data and code [rsync]. If the connection fails, the Network member immediately trys the next entry in the Network list and so on until the next entry points to itself; in that case it checks to see if a Scheduled Grab is due and tries to Grab from the Source. The Master immediately checks to see if a Scheduled Grab is due and tries to Grab from the Source. The data is stamped in order that the test for the retrieving Scheduled Grab will fail in the future.
If any Grab fails then notification is sent.


Discussion
This is a fairly simple distribution approach - less to go wrong.
Clock sync between the Network servers isn't critical (but ntp is always good). Peers will never attempt a Grab unless the Master and all intervening Peers are unreachable. When any Network member returns it should not respond to requests until it has synchronised with a superior (or the Sources in the case of the Master). There's one obvious timing issue: we need to understand how long a full set of Grabs will take allowing for parallelism and maybe the +1/+7/+14 approach. Worst case is a Master going down just before a Grab set is due to complete and the next Peer starting the Grab very late and causing eager clients to essentially miss a days update. Clients could notice this and compensate the next day. Note that Grabs are retried after an hour. Potential issue if a grab takes over an hour! That would require some locking. If the grabbing server failed whilst the lock was held then the assuming Peer should ignore the lock.


High level Client behaviour
==================
Initially Clients pull the Network list from the Master; thereafter they select a machine from the Network at random and use that. If a connection times out then they try a different Peer. The Network list is updated each time the Client uses the bleb service.

Discussion
Load balancing is done simply by the client.
Temporary Peer failure is hardly noticed.
Out of service Peers can be removed by the network and will cause minor inconvenience until the Use following their removal (nothing stops the client from blacklisting a Peer). In the event of the Master going down for good (for whatever reason) then the next entry in the Network can issue a new Network list. Service Uses should probably be scheduled to allow time for Sources to be consulted - maybe 3am onwards? [ I wrote this and it's true so I'll leave it in but a review tells me it's overkill: ] From a 'security' point of view - any Peer can permanently hijack clients using it by providing a bad Network file. If we care then we could digitally sign the file and arrange for trusted admins to have access to the signature. The clients would then refuse to honour a bad Network file.

This is just a start - I've no doubt made mistakes :)

David