Discussion:
[Oscar-devel] Current OSCAR status.
LAHAYE Olivier
2014-03-07 09:43:26 UTC
Permalink
Dear all,

OSCAR is progressing well, but a few things are still needed.

The current situation is that We rcan reach step 8 with no errors on rhel6 and fedora-17 at least.
On RHEL6, the modules that were used are:
- apitest
- base
- blcr
- c3
- ganglia
- jobmonarch
- maui
- mtaconfig
- munge
- netbootmgr
- nfs
- ntpconfig
- oda
- openmpi
- sc3
- torque
- yume

The remaining important stuffs to fix are:
- Finish test scripts for mpi* components (lamp, openmpi, ...)
- Finish test scripts for torque, maui
Make sure that the above tests are relevent for problem diagnostics. For example, actual test works while there is still something broken on my test config. (job hung in some not yet understood circimstances). Once understood, I'll add a test to diag this specific problem (except if it's a torque bug of course).
- Create some test scripts for ntpconfig, mtaconfig, ...
- Add more diagnostic tests for base
- Create opkg for slurm (scripts for queue creation and configuration and tests).

The remaining less important stuffs
- Fix "Delete OSCAR Clients" in install mode that restart some services even if step 7 has not yet been run. => Need work here as it's not a trivial things (many situations)
- Port netbootmgr gui from perl-qt3 to perl-qt4 to "Network Boot Manager" is usable in manage mode.

Future: (after next stable release)
- Enhance manage mode with:
- Disable/Stop node (for maintenance for example)
=> optional: depending of blcr availability: hubernate jobs (if node is responsive) or ask user if he realy wants to stop node.
=> optional: if job is hybernated, try to dehybernate it on another node
=> update c3, queue, ... so it's not possible to send new jobs/commands to the specifiied node
- Enable a node (after a manitenance for example)
=> update c3, queue, ... so it's seen by batch system, C3 , ganglia, ...
- Better GUI.

Best regards.

--
Olivier LAHAYE
CEA DRT/LIST/DIR

Loading...