We have proposed two compiler-assisted software-cache schemes. One is a page-based system (Asymmetric Distributed Shared Memory: ADSM) which exploits TLB/MMU only in the cases of read-cache-misses. Another is a segment-based system (User-level Distributed Shared Memory: UDSM) which uses only user-level checking codes and consistency management codes for software-cache. Under these schemes, an optimizing compiler directly analyses shared memory source programs, and performs sufficient optimization. It exploits capabilities of the middle-grained or coarse-grained remote-memory-accesses in order to reduce the number and the amount of communications and to alleviate overheads of user-level checking codes. It uses interprocedural points-to analysis and interprocedural redundancy elimination and coalescing optimization. We have implemented the above optimizing compiler for both schemes. We also have implemented runtime systems for user-level cache emulation. Both ADSM runtime system and UDSM runtime system run on the SS20 cluster connected with the Fast Ethernet(100BASE-TX). We have revealed that both schemes achieve high speed-up ratio with the SPLASH-2 benchmark suite.